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Abstract — We develop elements of a theory of cooperation and 
coordination in networks. Rather than considering a commu- 
nication network as a means of distributing information, or 
of reconstructing random processes at remote nodes, we ask 
what dependence can be established among the nodes given 
the communication constraints. Specifically, in a network with 
communication rates {Ri,j} between the nodes, we ask what 
is the set of all achievable joint distributions p(xi, x m ) of 
actions at the nodes of the network. Several networks are solved, 
including arbitrarily large cascade networks. 

Distributed cooperation can be the solution to many problems 
such as distributed games, distributed control, and establishing 
mutual information bounds on the influence of one part of a 
physical system on another. 

Index Terms — Common randomness, cooperation capacity, co- 
ordination capacity, network dependence, rate distortion, source 
coding, strong Markov lemma, task assignment, Wyner common 
information. 



I. Introduction 

COMMUNICATION is required to establish cooperative 
behavior. In a network of nodes where relevant informa- 
tion is known at only some nodes in the network, finding the 
minimum communication requirements to coordinate actions 
can be posed as a network source coding problem. This 
diverges from traditional source coding. Rather than focus 
on sending data from one point to another with a fidelity 
constraint, we consider the communication needed to establish 
coordination summarized by a joint probability distribution of 
behavior among all nodes in the network. 

A large variety of research addresses the challenge of 
collecting or moving information in networks. Network coding 
HI seeks to efficiently move independent flows of informa- 
tion over shared communication links. On the other hand, 
distributed average consensus [2| involves collecting related 
information. Sensors in a network collectively compute the 
average of their measurements in a distributed fashion. The 
network topology and dynamics determine how many rounds 
of communication among neighbors are needed to converge to 
the average and how good the estimate will be at each node Q. 
Similarly, in the gossiping Dons problem H, each node starts 
with a unique piece of gossip, and one wishes to know how 
many exchanges of gossip are required to make everything 
known to everyone. Computing functions in a network is 
considered in 0, 0, and Q. 
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Our work, introduced in [8], has several distinctions from 
the network communication examples mentioned. First, we 
keep the purpose for communication very general, which 
means sometimes we get away with saying very little about 
the information in the network while still achieving the desired 
coordination. We are concerned with the joint distribution of 
actions taken at the various nodes in the network, and the 
"information" that enters the network is nothing more than 
actions that are selected randomly by nature and assigned 
to certain nodes. Secondly, we consider quantization and 
rates of communication in the network, as opposed to only 
counting the number of exchanges. We find that we can gain 
efficiency by using vector quantization specifically tailored to 
the network topology. 

Figure Q] shows an example of a network with rate-limited 
communication links. In general, each node in the network 
performs an action where some of these actions are selected 
randomly by nature. In this example, the source set S indicates 
which actions are chosen by nature: Actions X\, X2, and 
X3 are assigned randomly according to the joint distribution 
Po(xi,X2,X3). Then, using the communication and common 
randomness that is available to all nodes, the actions Y±, Y%, 
and Y3 outside of S are produced. We ask, which conditional 
distributions p(yi, y2,ys\xi,X2, X3) are compatible with the 
network constraints. 



S ' Xi 

/ p (x 1 ,x 2 ,x 3 ) • 




^po(^) = {p(yi,y2,y3\xi,x 2 ,x 3 )} 



Po 

Fig. 1 . Coordination capacity. This network represents the general framework 
we consider. The nodes in this network have rate-limited links of communi- 
cation between them. Each node performs an action. The actions X\, X2, 
and Xz in the source set S are chosen randomly by nature according to 
po{xi , X2, X3), while the actions Y\, Y2, and Y3 are produced based on 
the communication and common randomness in the network. What joint 
distributions po(xi , X2, 23)25(2/1, V2, V3 \%l , 23) can be achieved? 



A variety of applications are encompassed in this frame- 
work. This could be used to model sensors in a sensor 
network, sharing information in the standard sense, while 
also cooperating in their transmission of data. Similarly, a 
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wireless ad hoc network can improve performance by coop- 
erating among nodes to allow beam-forming and interference 
alignment. On the other hand, some settings do not involve 
moving information in the usual sense. The nodes in the 
network might comprise a distributed control system, where 
the behavior at each node must be related to the behavior at 
other nodes and the information coming into the system. Also, 
with computing technology continuing to move in the direction 
of parallel processing, even across large networks, a network 
of computers must coherently perform computations while 
distributing the work load across the participating machines. 
Alternatively, the nodes might each be agents taking actions 
in a multiplayer game. 

Network communication can be revisited from the view- 
point of coordinated actions. Rate distortion theory becomes 
a special case. More generally, we ask how we can build 
dependence among the nodes. What is it good for? How do 
we use it? 

In this paper we deal with two fundamentally different 
notions of coordination which we distinguish as empirical 
coordination and strong coordination, both associated with a 
desired joint distribution of actions. Empirical coordination is 
achieved if the joint type of the actions in the network — the 
empirical joint distribution — is close to the desired distribu- 
tion. Techniques from rate-distortion theory are relevant here. 
Strong coordination instead deals with the joint probability 
distribution of the actions. If the actions in the network are 
generated randomly so that a statistician cannot reliably distin- 
guish (as measured by total variation) between the constructed 
n-length sequence of actions and random samples from the 
desired distribution, then strong coordination is achieved. The 
approach and proofs in this framework are related to the 
common information work by Wyner |9l . 

Before developing the mathematical formulation, consider 
the first surprising observation. 

No communication: Suppose we have three nodes choosing 
actions and no communication is allowed between the nodes 
(Fig. |2). We assume that common randomness is available to 
all the nodes. What is the set of joint distributions p(x, y, z) 
that can be achieved at these isolated nodes? The answer turns 
out to be any joint distribution whatsoever. The nodes can 
agree ahead of time on how they will behave in the presence 
of common randomness (for example, a time stamp used as a 
seed for a random number generator). Any triple of random 
variables can be created as functions of common randomness. 

This would seem to be the end of the problem, but the 
problem changes dramatically when one of the nodes is 
specified by nature to take on a certain value, as will be the 
case in each of the scenarios following. 

An eclectic collection of work, ranging from game theory to 
quantum information theory, has a number of close relation- 
ships to our approach and results. For example, Anantharam 
and Borkar iflOl let two agents generate actions for a multi- 
player game based on correlated observations and common 
randomness and ask what kind of correlated actions are 
achievable. From a quantum mechanics perspective, Barnum 
et. al. [11] consider quantum coding of mixed quantum states. 
Kramer and Savari [ 12] look at communication for the purpose 



Y = Y(w) 
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X(u) 



V = {p(x,y,z}} 
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Fig. 2. No communication. Any distribution p(x, y, z) can be achieved 
without communication between nodes. Define three random variables X(-), 
Y(-), and Z(-) with the appropriate joint distribution, on the standard 
probability space (Q, B, V), and let the actions at the nodes be X(ui), Y(ui), 
and Z(lu), where u> £ Q is the common randomness. 



of "communicating probability distributions" in the sense that 
they care about reconstructing a sequence with the proper 
empirical distribution of the sources rather than the sources 
themselves. Weissman and Ordentlich lfl3l make statements 
about the empirical distributions of sub-blocks of source and 
reconstruction symbols in a rate-constrained setting. And Han 
and Verdu [ 14] consider generating a random process via use 
of a memoryless channel, while Bennett et. al. lfT31l propose 
a "reverse Shannon theorem" stating the amount of noise free 
communication necessary to synthesize a memoryless channel. 

In this work, we consider coordination of actions in two 
and three node networks. These serve as building blocks for 
understanding larger networks. Some of the actions at the 
nodes are given by nature, and some are constructed by the 
node itself. We describe the problem precisely in Section [TTJ 
For some network settings we characterize the entire solution, 
but for others we give partial results including bounds and 
solutions to special cases. The complete results are presented 
in Section Hn] and include a variant of the multiterminal source 
coding problem. Among the partial results of Section IIV1 a 
consistent trend in coordination strategies is identified, and 
the golden ratio makes a surprise appearance. 

In Section [V] we consider strong coordination. We charac- 
terize the communication requirements in a couple of settings 
and discuss the role of common randomness. If common 
randomness is available to all nodes in the network then 
empirical coordination and strong coordination seem to require 
equivalent communication resources, consistent with the impli- 
cations of the "reverse Shannon theorem" [15]. Furthermore, 
we can quantify the amount of common randomness needed, 
treating common randomness itself as a scarce resource. 

Rate-distortion regions are shown to be projections of the 
coordination capacity region in Section [VT] The proofs for 
all theorems are presented together in Section IVHI where we 
introduce a stronger Markov Lemma (Theorem fT2T) that may 
be broadly useful in network information theory. In our closing 
remarks we show cases where this work can be extrapolated to 
large networks to identify the efficiency of different network 
topologies. 

II. Empirical Coordination 

In this section and the next we address questions of the 
following nature: If three different tasks are to be performed 
in a shared effort between three people, but one person is 
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randomly assigned his responsibility, how much must he tell 
the others about his assignment in order to divide the labor? 

A. Problem specifics 

The definitions in this section pinpoint the concept of 
empirical coordination. We will consider coordination in a 
variety of two and three node networks. The basic meaning of 
empirical coordination is the same for each network — we use 
the network communication to construct a sequence of actions 
that have an empirical joint distribution closely matching a 
desired distribution. What's different from one problem to the 
next is the set of nodes whose actions are selected randomly 
by nature and the communication limitations imposed by the 
network topology. 

Here we define the problem in the context of the cascade 
network of Section IIII-CI shown in Figure [3] These definitions 
have obvious generalizations to other networks. 



Node X 


I e [2""'] 


Node Y 
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Fig. 3. Cascade network. Node X is assigned actions X n chosen by nature 
according to p(x n ) = Yl"—iPo{ x i)- A message I in the set {1, ...,2" fll } 
is constructed based on X n and the common randomness uj and sent to Node 
Y, which constructs both an action sequence Y n and a message J in the set 
{1, ...,2 nR2 }. Finally, Node Z produces actions Z n based on the message 
J and the common randomness oj. This is summarized in Figure [4] 
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Fig. 4. Shorthand notation for the cascade network of Figure f5] 

In the cascade network of Figure [3] node X has a sequence 
of actions X\^X^ 1 ... specified randomly by nature. Note that 
a node is allowed to see all of its actions before it summarizes 
them for the next node. Communication is used to give Node 
Y and Node Z enough information to choose sequences 
of actions that are empirically correlated with Xi,X2,... 
according to a desired joint distribution po(x)p(y, z\x). The 
communication travels in a cascade, first from Node X to 
Node Y at rate R\ bits per action, and then from Node Y to 
Node Z at rate i?2 bits per action. 

Specifically, a (2 nRl , 2 nR ' 2 , n) coordination code is used as 
a protocol to coordinate the actions in the network for a block 
of n time periods. The coordination code and the distribution 
of the random actions X n induce a joint distribution on 
the actions in the network. If the joint type of the actions 
in the network can be made arbitrarily close to a desired 
distribution p (x)p(y, z\x) with high probability, as dictated 
by the distribution induced by a (2 nRl , 2 nR2 , n)) coordination 
code, then p (x)p(y , z\x) is achievable with the rate pair 
(Ri, i?2)- 



Definition 1 (Coordination code). A (2 nRl , 2 nR2 , n) coordi- 
nation code for the cascade network of Figure \3\ consists of 
four functions — an encoding function 

i : X n x f2 — > {l,...,2 nRl }, 

a recoding function 

j : {l,..,2'*}xfl->{l ) ..,y*} l 

and two decoding functions 

y n : {l,...,2 nfll } x fi — >y\ 
z n : {l,...,2 nR2 }xQ^Z n . 

Definition 2 (Induced distribution). The induced distribution 
p(x n ,y n ,z n ) is the resulting joint distribution of the actions 
in the network X n , Y n , and Z n when a (2 nR \2 R2 ,n) 
coordination code is used. 

Specifically, the actions X" are chosen by nature i.i.d. ac- 
cording to po(x) and independent of the common randomness 
ui. Thus, X n and ui are jointly distributed according to a 
product distribution, 



pMjJpoOci). 



The actions Y" and Z n are functions of X n and u> given by 
implementing the coordination code as 

Y n = y n {i{X n ,u),u), 

Z n = Z n (j( t (X n ,L0),L0),L0). 

Definition 3 (Joint type). The joint type P x ™,y n ,z n of a tuple 
of sequences (x n ,y n ,z n ) is the empirical probability mass 
function, given by 



Pr 



1 

-zlmxuyuzi) = (x,y,z)), 
n * — ' 

i=l 



z n(x,y,z) 

for all (x,y, z) € X x y x Z, where 1 is the indicator function. 



Definition 4 (Total variation). The total variation between two 
probability mass functions is half the L\ distance between 
them, given by 

\\p(x,y,z) - q(x,y,z)\\ TV = ~^\p{x,y,z) - q(x,y,z)\. 

x,y,z 

Definition 5 (Achievability). A desired distribution 
Po(x)p(y, z\x) is achievable for empirical coordination 
with the rate pair (i?i,i?2) if there exists a sequence of 
{2 nRl , 2 nR2 , n) coordination codes and a choice of p(uj) 
such that the total variation between the joint type of the 
actions in the network and the desired distribution goes to 
zero in probability (under the induced distribution). That is, 



\P 



X n ,Y n ,Z 



n (x,y, z) - p (x)p(y, z\x)\\ 



TV 



in probability. 



We now define the region of all rate-distribution pairs in 
Definition [6] and slice it into rates for a given distribution 
in Definition [7] and distributions for a given set of rates in 
Definition 
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Definition 6 (Coordination capacity region). The coordination 
capacity region C Po for the source distribution po(x) is the clo- 
sure of the set of rate -coordination tuples (R%, R 2 ,p(y, z\x)) 
that are achievable: 

c A C1 f (Ri,R2, p(y,z\x)) : 1 
Po \ po(x)p(y, z\x) is achievable at rates (Ri,R 2 ) J 

Definition 7 (Rate-coordination region). The rate- 
coordination region 1Z P0 is a slice of the coordination 
capacity region corresponding to a fixed distribution 

p{y,z\x): 

K- Po (p(y,z\x)) = {(Ri,R 2 ) : (Ri, R 2 ,p(y, z\x)) € C Po }. 

Definition 8 (Coordination-rate region). The coordination- 
rate region V Po is a slice of the coordination capacity region 
corresponding to a tuple of rates (R\,R 2 ): 

V Po (Ri,R 2 ) = {p(y,z\x) : (R 1: R 2 ,p(y, z\x)) e C Po }. 
B. Preliminary observations 

Lemma 1 (Convexity of coordination). C Po , 1Z P0 , and V Po are 
all convex sets. 

Proof: The coordination capacity region C Po is convex 
because time-sharing can be used to achieve any point on the 
chord between two achievable rate-coordination pairs. Simply 
combine two sequences of coordination codes that achieve 
the two points in the coordination capacity region by using 
one code and then the other in a proportionate manner to 
achieve any point on the chord. The definition of joint type 
in Definition [3] involves an average over time. Thus if one 
sequence is concatenated with another sequence, the resulting 
joint type is a weighted average of the joint types of the two 
composing sequences. Rates of communication also combine 
according to the same weighted average. The rate of the 
resulting concatenated code is the weighted average of the 
two rates. 

The rate-coordination region 7Z Po is the intersection of the 
coordination capacity region C Po with a hyperplane, which are 
both convex sets. Likewise for the coordination-rate region 
V Po . Therefore, TZ Po and V Po are both convex. ■ 

Common randomness used in conjunction with randomized 
encoders and decoders can be a crucial ingredient for some 
communication settings, such as secure communication. We 
see, for example, in Section [V] that common randomness is a 
valuable resource for achieving strong coordination. However, 
it does not play a necessary role in achieving empirical 
coordination, as the following theorem shows. 

Theorem 2 (Common randomness doesn't help). Any desired 
distribution po(x)p(y, z\x) that is achievable for empirical 
coordination with the rate pair (Ri, R 2 ) can be achieved with 
n = 0. 

Proof: Suppose that po(x)p(y, z | x) is achievable for 
empirical coordination with the rate pair (R 1: R 2 ). Then there 
exists a sequence of (2 nRl ,2 nR2 ,n) coordination codes for 
which the expected total variation between the joint type and 
p(x, y, z) goes to zero with respect to the induced distribution. 



This follows from the bounded convergence theorem since 
total variation is bounded by one. By iterated expectation, 

E [E [\\Px*,Yn, Z n - p (x)p(y,z\x)\\ TV |w]] = 

E\\P x «,y«,z« -Po(x)p(y,z\x)\\ TV . 

Therefore, there exists a value w* such that 

E [\\P X «,Yn,Z" - Po(x)p(y, z\x)\\ TV \u*] < 
E\\P x »,y»,z» ~ Po(x)p(y,z\x)\\ TV . 

Define a new coordination code that doesn't depend on 
u> and at the same time doesn't increase the expected total 
variation: 

i*(x n ) = i(x n ,w*), 

f(i) = 
lT(t) = Y n (i,u*), 

z n *(j) = z»o>*)- 

This can be done for each (2 nRl , 2 nR2 , n) coordination code 
for n= 1,2, .... ■ 

C. Generalization 

We investigate empirical coordination in a variety of net- 
works in Sections [III] and [IV] In each case, we explicitly 
specify the structure and implementation of the coordination 
codes, similar to Definitions Q] and [2 while all other definitions 
carry over in a straightforward manner. 

We use a shorthand notation in order to illustrate each 
network setting with a simple and consistent figure. Figure [4] 
shows the shorthand notation for the cascade network of Figure 
[3] The random actions that are specified by nature are shown 
with arrows pointing down toward the node (represented by 
a block). Actions constructed by the nodes themselves are 
shown coming out of the node with an arrow downward. And 
arrows indicating communication from one node to another 
are labeled with the rate limits for the communication along 
those links. 

III. Coordination — Complete results 

In this section we present the coordination capacity regions 
C Po for empirical coordination in four network settings: a 
network of two nodes; a cascade network; an isolated node 
network; and a degraded source network. Proofs are left to 
Section I VIII As a consequence of Theorem [2] we need not 
use common randomness. Common randomness will only be 
required when we try to generate desired distributions over 
entire n-blocks in Section [V] 

A. Two nodes 

In the simplest network setting shown in Figure [5] we 
consider two nodes, X and Y. The action X is specified by 
nature according to po(x), and a message is sent at rate R to 
node Y. 

The (2 nR , n) coordination codes consist of an encoding 
function 

i ■ X n — >{l,...,2 nR }, 
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•po{x) 



Node X 



Node Y 



By applying Theorem \3\ we find that the rate-coordination 
region TZp (p(jj\x)) is given by 

k 



R : R > log 



1 



Fig. 5. Two nodes. The action X is chosen by nature according to po(x). 
A message is sent to node Y at rate R. The coordination capacity region C P0 
is the set of rate-coordination pairs where the rate is greater than the mutual 
information between X and Y. 



and a decoding function 



{1,...,2"*} 



B. Isolated node 

Now we derive the coordination capacity region for the 
isolated-node network of Figure [7] Node X has an action 
chosen by nature according to po(x), and a message is sent at 
rate R from node X to node Y from which node Y produces 
an action. Node Z also produces an action but receives no 
communication. What is the set of all achievable coordination 
distributions p(y, z\x)l At first it seems that the action at the 
isolated node Z must be independent of Y, but we will see 
otherwise. 



The actions X n are chosen by nature i.i.d. according to 
Po(x), and the actions Y n are functions of X n given by 
implementing the coordination code as 



Y' 



y n (i(X n )). 



Theorem 3 (Coordination capacity region). The coordination 
capacity region C Po for empirical coordination in the two-node 
network of Figure\5\is the set of rate -coordination pairs where 
the rate is greater than the mutual information between X and 
Y. Thus, 



R>I(X;Y) } 



Discussion: The coordination capacity region in this setting 
yields the rate-distortion result of Shannon fl6l . Notice that 
with no communication (R = 0), only independent distribu- 
tions po(x)p(y) are achievable, in contrast to the setting of 
Figure [2] where none of the actions were specified by nature 
and all joint distributions were achievable. 

Example 1 (Task assignment). Suppose there are k tasks 
numbered 1 through k. One task is dealt randomly to node X, 
and node Y needs to choose one of the remaining tasks. This 
coordinated behavior can be summarized by a distribution 
p. The action X is given by nature according to Po(x), 
the uniform distribution on the set {1, k). The desired 
conditional distribution of the action Y is p(y\x), the uniform 
distribution on the set of tasks different from x. Therefore, 
the joint distribution po(x)p(y\x) is the uniform distribution 
on pairs of differing tasks from the set {1, ...,k}. Figure \6\ 
illustrates a valid outcome for k larger than 5. 



k > . 



Fig. 6. Task assignment in the two-node network. A task from a set of 
tasks numbered 1, ...,k is to be assigned uniquely to each of the nodes X 
and Y in the two-node network setting. The task assignment for X is given 
randomly by nature. The communication rate R > log(/c/fe — 1) is necessary 
and sufficient to allow Y to select a different task from X. 



X~p (x) 




Node Z 



Fig. 7. Isolated node. The action X is chosen by nature according to po(x), 
and a message is sent at rate R from node X to node Y. Node Z receives 
no communication. The coordination capacity region C po is the set of rate- 
coordination pairs where p(x, y, z) = po(x)p(z)p(y\x, z) and the rate R is 
greater than the conditional mutual information between X and Y given Z. 



We formalize this problem as follows. The (2 nR ,n) coor- 
dination codes consist of an encoding function 

i : X n — > {1, 2 nR }, 

a decoding function 

y n ■■ {i,-,2 nR } 

and a deterministic sequence 



The actions X n are chosen by nature i.i.d. according to 
Po(x), and the actions Y n are functions of X n given by 
implementing the coordination code as 



Y n = 
Z n = 



y n (i(x n )), 

z n . 



The coordination capacity region for this network is given 
in the following theorem. As we previously alluded, notice 
that the action Z need not be independent of Y, even though 
there is no communication to node Z. 

Theorem 4 (Coordination capacity region). The coordination 
capacity region C po for empirical coordination in the isolated- 
node network of Figure\7\is the set of rate-coordination pairs 
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where Z is independent of X and the rate R is greater than 
the conditional mutual information between X and Y given 
Z. Thus, 



r — 
'"Po — 



{(R,p(z)p(y\x,z)) : R>I(X;Y\Z) } 



Discussion: How can Y and Z have a dependence when 
there is no communication between them? This dependence 
is possible because neither Y nor Z is chosen randomly by 
nature. In an extreme case, we could let node Y ignore the 
incoming message from node X and let the actions at node Y 
and node Z be equal, Y = Z. Thus we can immediately see 
that with no communication the coordination region consists 
of all distributions of the form Po(x)p(y, z). 

If we were to use common randomness ui to generate 
the action sequence Z n {uj), then Node Y, which also has 
access to the common randomness, can use it to produce 
correlated actions. This does not increase the coordination 
capacity region (see Theorem |2), but it provides an intuitive 
understanding of how Y and Z can be correlated. Without 
explicit use of common randomness, we select a determinist 
sequence z n before-hand as part of our codebook and make 
it known to all parties. 

It is interesting to note that there is a tension between the 
correlation of X and Y and the correlation of Y and Z. 
For instance, if the communication is used to make perfect 
correlation between X and Y then any potential correlation 
between Y and Z is forfeited. 

Within the results for the more general cascade network in 
the sequel (Section IIH-Ct we will find that Theorem [4] is an 
immediate consequence of Theorem by letting R2 = 0. 

Example 2 (Jointly Gaussian). Jointly Gaussian distributions 
illustrate the tradeoff between the correlation of X and Y and 
the correlation ofY and Z in the isolated-node network. Con- 
sider the portion of the coordination-rate region V Po (R) that 
consists of jointly Gaussian distributions. If X is distributed 
according to N(0, cr x )> w hat set of covariance matrices can 
be achieved at rate R? 

So far we have discussed coordination for distribution 
functions with finite alphabets. Extending to infinite alphabet 
distributions, achievability means that any finite quantization 
of the joint distribution is achievable. 

Using Theorem we bound the correlations as follows: 



R 



> 



I(X;Y\Z) 
I(X;Y,Z) 
1, \K x \\K yz 
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(i) 



where p xy and p yz are correlation coefficients. Equality (a) 
holds because o xz — due to the independence between X 



and Z. Obtain equality (b) by dividing the numerator and 
denominator of the argument of the log by o'^.OyO^. 

Unfolding (Q yields a linear tradeoff between the p xy and 



p 2 yz , given by 



(1 



-2R 



Hxy 



P 2 yz 



< 1. 



Thus all correlation coefficients p xy and p yz satisfying this 
constraint are achievable at rate R. 

C. Cascade 

We now give the coordination capacity region for the 
cascade of communication in Figure [8] In this setting, the 
action at node X is chosen by nature. A message at rate Ri is 
sent from node X to node Y, and subsequently a message at 
rate R 2 is sent from node Y to node Z based on the message 
received from node X. Nodes Y and Z produce actions based 
on the messages they receive. 



R 2 



Fig. 8. Cascade. The action X is chosen by nature according to po(x). A 
message is sent from node X to node Y at rate R± . Node Y produces an action 
Y and a message to send to node Z based on the message received from node 
X. Node Z then produces an action Z based on the message received from 
node Y. The coordination capacity region C po is the set of rate-coordination 
triples where the rate Ri is greater than the mutual information between X 
and (Y, Z), and the rate R2 is greater than the mutual information between 
X and Z. 

The formal statement is as follows. The (2 nRl , 2 nR2 , n) 
coordination codes consist of four functions — an encoding 
function 



i : X n — > 
a recoding function 

3 ■ {l,." ) 2 nRl } 
and two decoding functions 

{l,-,2 nfl1 } 
{!,..., 2"^} 



{l,-,2" fi1 }, 
^{1,...,2"« 2 }, 



y 

z Tl 



y 1 , 

z n . 



The actions X n are chosen by nature i.i.d. according to 
Po(x), and the actions Y n and Z n are functions of X n given 
by implementing the coordination code as 



yn 

z n 



z n (j(i(X n ))). 



This network was considered by Yamamoto 1(171 in the 
context of rate-distortion theory. The same optimal encoding 
scheme from his work achieves the coordination capacity 
region as well. 

Theorem 5 (Coordination capacity region). The coordination 
capacity region C Po for empirical coordination in the cascade 
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network of Figure \8\ is the set of rate-coordination triples 
where the rate R\ is greater than the mutual information 
between X and (Y, Z), and the rate R 2 is greater than the 
mutual information between X and Z. Thus, 

Rx>I{X;Y,Z), 
R 2 > I{X-Z). 



r 



(Rx,R 2 ,p(y,z\x)) 



Discussion: The coordination capacity region C Po meets the 
cut-set bound. The trick to achieving this bound is to first 
specify Z and then specify Y conditioned on Z. 

Example 3 (Task assignment). Consider a task assignment 
setting where three tasks are to be assigned without dupli- 
cation to the three nodes X, Y, and Z, and the assignment 
for node X is chosen uniformly at random by nature. A dis- 
tribution capturing this coordination behavior is the uniform 
distribution over the six permutations of task assignments. Let 
Po(x) be the uniform distribution on the set {1,2, 3}, and let 
p{y, z\x) give equal probability to both of the assignments to 
Y and Z that produce different tasks at the three nodes. Figure 
[9] illustrates a valid outcome of the task assignments. 



X 



Fig. 9. Task assignment in the cascade network. Three tasks, numbered 1, 2, 
and 3, are distributed among three nodes X, Y, and Z in the cascade network 
setting. The task assignment for X is given randomly by nature. The rates 
Rl > log 3 and R2 > log 3 — log 2 are required to allow Y and Z to choose 
different tasks from X and from each other. 

According to Theorem \5\ the rate-coordination region 
K$ {p{y,z\x)) is given by 



(Ri,R 2 ) 



Ri > log 3, 

i? 2 > log 3 -log 2. 



D. Degraded source 

Here we present the coordination capacity region for the 
degraded-source network shown in Figure [TO] Nodes X and Y 
each have an action specified by nature, and Y is a function 
of X. That is, p {x,y) = p (x)l(y = fo(x)), where l(-) is 
the indicator function. Node X sends a message to node Y at 
rate R\ and a message to node Z at rate R 2 . Node Y, upon 
receiving the message from node X, sends a message at rate 
i?3 to node Z. Node Z produces an action based on the two 
messages it receives. 

The (2 nRl , 2 nR2 , 2 ni?3 , n) coordination codes for Figure [TO] 
consist of four functions — two encoding functions 

% : X n — ■» {l,...,2 nRl }, 
j : X n -^{l,...,2 nR -}, 

a recoding function 

k : {l,..,2 nfll }xr->{l,..,2 nfl3 }, 
and a decoding function 

z n : {1, 2 nR2 } x {1, 2 nR3 } — -> y n . 



Y = MX) 



X~p (x) 




Fig. 10. Degraded source: The action X is specified by nature according to 
po(x), and the action Y is a function /o of X. A message is sent from node 
X to node Y at rate R±, after which node Y constructs a message for node 
Z at rate R3 based on the incoming message from node X and the action Y. 
Node X also sends a message directly to node Z at rate R2 . The coordination 
capacity region C po is given in Theorem [6] 



The actions X n and Y n are chosen by nature i.i.d. according 
to po(x,y), having the property that Yi = fo(Xi) for all i, 
and the actions Z n are a function of X n and Y n given by 
implementing the coordination code as 



Y' 



y n (j(X n ),k(i(X n ),Y n )). 



Others have investigated source coding networks in the rate- 
distortion context where two sources are encoded at separate 
nodes to be reconstructed at a third node. Kaspi and Berger 
[18 1 consider a variety of cases where the encoders share 
some information. Also, Barros and Servetto |19| articulate 
the compress and bin strategy for more general bi-directional 
exchanges of information among the encoders. While falling 
under the same general compression strategy, the degraded 
source network is a special case where optimality can be 
established, yielding a characterization of the coordination 
capacity region. 

Theorem 6 (Coordination capacity region). The coordina- 
tion capacity region C Po for empirical coordination in the 
degraded-source network of Figure \TU\ is given by 

3p(u\x,y,z) such that 
\U\<\X\\Z\+2, 

C vn = {(Rx,R 2 ,R 3 ,p{z\x,y)) : Ri > I(X; U\Y), 

R 2 >I(X;Z\U), 
R 3 > I(X; U). 

IV. Coordination — Partial Results 

We have given the coordination capacity region for several 
multinode networks. Those results are complete. We now 
investigate networks for which we have only partial results. 

In this section we present bounds on the coordination 
capacity regions C Po for empirical coordination in two net- 
work settings of three nodes — the broadcast network and the 
cascade-multiterminal network. A communication technique 
that we find useful in both settings, also used in the degraded- 
source network of Section [Till is to use a portion of the 
communication to send identical messages to all nodes in 
the network. The common message serves to correlate the 
codebooks used on different communication links and can 
result in reduced rates in the network. 



s 



Proofs are left to Section IVIII Again, as a consequence 
of Theorem [2] we need not use common randomness in this 
section. 

A. Broadcast 

We now give bounds on the coordination capacity region for 
the broadcast network of Figure [TT] In this setting, node X has 
an action specified by nature according to pa(x) and sends one 
message to node Y at rate R\ and a separate message to node 
Z at rate i?2- Nodes Y and Z each produce an action based 
on the message they receive. 




z 

Fig. 11. Broadcast. The action X is chosen by nature according to pa(x). A 
message is sent from node X to node Y at rate R± , and a separate message is 
sent from node X to node Z at rate R2 . Nodes Y and Z produce actions based 
on the messages they receive. Bounds on the coordination capacity region C P0 
are given in Theorem UJ 

Node X serves as the controller for the network. Nature 
assigns an action to node X, which then tells node Y and 
node Z which actions to take. 

The (2 nRl , 2 nR2 , n) coordination codes consist of two en- 
coding functions 

i : X n — -> {l,...,2 nRl }, 

j ■■ ;r — >{i,...,2»«'}, 

and two decoding functions 

y n : {l,...,2 nRl } — >y\ 
z n : {l,...,2 nRz } -^Z n . 

The actions X n are chosen by nature i.i.d. according to 
Pq(x), and the actions Y n and Z n are functions of X n given 
by implementing the coordination code as 

Y n = y n (i(X n )). 

From a rate-distortion point of view, the broadcast network 
is not a likely candidate for consideration. The problem sep- 
arates into two non-interfering rate-distortion problems, and 
the relationship between the sequences Y n and Z n is ignored 
(unless the decoders communicate as in |20|). However, a 
related scenario, the problem of multiple descriptions [21 J, 
where the combination of two messages / and J are used to 
make a third estimate of the source X, demands consideration 
of the relationship between the two messages. In fact, the 



communication scheme for the multiple descriptions problem 
presented by Zhang and Berger 11221 coincides with our inner 
bound for the coordination capacity region in the broadcast 
network. 

The set of rate-coordination tuples C PQ .in is an inner bound 
on the coordination capacity region, given by 

C A 

(Ri : R2,p(y, z\x)) : 3p(u\x, y, z) such that 
R X >I{X;U,Y), 
R 2 >I(X;U,Z), 
k Ri+R 2 >I(X;U,Y)+I(X;U,Z) + I(Y;Z\X,U). t 

The set of rate-coordination tuples C POyOU t is an outer bound 
on the coordination capacity region, given by 

{(Rl,R 2 ,p(y,z\x)) : "I 

Rx>I(X;Y), I 

R 2 >I(X;Z), (' 

R 1 +R 2 >I(X;Y,Z). J 

Also, define U PoM {p{y, z\x)) and TZ P0jOut (p(y, z\x)) to be 
the sets of rate pairs in C PBy i n and C POyOUt corresponding to 
the desired distribution p(y, z\x). 

Theorem 7 (Coordination capacity region bounds). The co- 
ordination capacity region C Pa for empirical coordination in 
the broadcast network of Figure [77] is bounded by 

C po ,in ^ C po d C p0 _ O ut- 

Discussion: The regions C Po .i n and C Po . ou t are convex. A 
time-sharing random variable can be lumped into the auxil- 
iary random variable U in the definition of C POt i n to show 
convexity. 

The inner bound C Pa .in is achieved by first sending a 
common message, represented by U, to both receivers and then 
private messages to each. The common message effectively 
correlates the two codebooks to reduce the required rates for 
specifying the actions Y n and Z n . The sum rate takes a 
penalty of I(Y; Z\X, U) in order to assure that Y and Z are 
coordinated with each other as well as with X. 

The outer bound C Po . ou t is a consequence of applying the 
two-node result of Theorem [3] in three different ways, once 
for each receiver, and once for the pair of receivers with full 
cooperation. 

For many distributions, the bounds in Theorem [7] are tight 
and the rate-coordination region 7Z Po = lZ Pa .in = 7Zp .out- 
This is true for all distributions where X, Y, and Z form 
a Markov chain in any order. It is also true for distributions 
where Y and Z are independent or where X is independent 
pairwise with both Y and Z. For each of these cases, Table 
U shows the choice of auxiliary random variable U in the 
definition of lZ POt i n that yields H P0: i n — lZ POyOU t- In case 
5, the region lZ POy i n is optimized by time-sharing between 
U = Y and U = Z. 

Notice that if R 2 = in the broadcast network we find 
ourselves in the isolated node setting of Section IIII-BI Con- 
sider a particular distribution po(x)p(z)p(y\x, z) that could 
be achieved in the isolated node network. In the setting of the 
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TABLE I 

Known capacity region (cases where 7£ POiin 



n 



PQ,OUt ) 





Condition 


Auxiliary 


Case 1: 


Y -X -Z 


U = 


Case 2: 


X -Y - Z 


u = z 


Case 3: 


X - Z -Y 


U = Y 


Case 4: 


Y ±Z 


u = $ 


Case 5: 


X _L Y and X _L Z 


U = Y,U = Z 



broadcast network, it might seem that the message from node 
X to node Z is useless for achieving po(x)p(z)p(y\x, z), since 
X and Z are independent. However, this is not the case. For 
some desired distributions po(x)p(z)p(y\x, z), a positive rate 
R-2 in the broadcast network actually helps reduce the required 
rate R\. 

To highlight a specific case where a message to node Z 
is useful even though Z is independent of X in the desired 
distribution, consider the following. Let p (x)p(z)p(y\x, z) be 
the uniform distribution over all combinations of binary x, 
y, and z with even parity. The variables X, Y, and Z are 
each Bernoulli-half and pairwise independent, and X © Y © 
Z = 0, where © is addition modulo two. This distribution 
satisfies both case 4 and case 5 from Table |U so we know 
that lZp a = 1Zp , O ut- Therefore, the rate-coordination region 
lZp (j (p(y, z\x)) is characterized by a single inequality, 



Kp (p(y,z\x)) 



{(-Ri.ifc) e 



Ri + R 2 > 1 bit}. 



The minimum rate R\ needed when no message is sent from 
node X to node Z is 1 bit, while the required rate in general 
is 1 — i?2 bits. 

The following task assignment problem has practical im- 
portance. 

Example 4 (Task assignment). Consider a task assignment 
setting similar to Example \3\ where three tasks are to be 
assigned without duplication to the three nodes X, Y, and Z, 
and the assignment for node X is chosen uniformly at random 
by nature. A distribution capturing this coordination behavior 
is the uniform distribution over the six permutations of task 
assignments. Let po (x) be the uniform distribution on the set 
{0,1,2}, and let p(y, z\x) give equal probability to both of 
the assignments to Y and Z that produce different tasks at the 
three nodes. Figure [72] illustrates a valid outcome of the task 
assignments. 



i ur> x 




Fig. 12. Task assignment in the broadcast network. Three tasks, numbered 
0, 1, and 2, are distributed among three nodes X, Y, and Z in the broadcast 
network setting. The task assignment for X is given randomly by nature. What 
rates Ri and R2 are necessary to allow Y and Z to choose different tasks 
from X and each other? 

We can explore the achievable rate region lZp {p{y, z\x)) 
by using the bounds in Theorem In this process, we find 



rates as low as log 3 — log <f> to be sufficient on each link, 
= ^ ' s the golden ratio. 



where 



A 



R-2 



1 - 




Fig. 13. Rate region bounds for task assignment. Points A, B, C, and D are 
achievable rates for the task assignment problem in the broadcast network. 
The solid line indicates the outer bound TZp a ,out(p(?/i *|a0), an d the dashed 
line indicates a subset of the inner bound 7?.^ i n (p(j/, z\x)). Points A and B 
are achieved by letting U = 0. Point C uses U as time-sharing, independent 
of X. Point D uses U to describe X partially to each of the nodes Y and Z. 

First consider the points in the inner bound 
T^-po ,m(P(y '> z \ x )) that are achieved without the use of 
the auxiliary variable U. This consists of a pentagonal region 
of rate pairs. The extreme point A — (log(3/2), log 3), shown 
in Figure [75] corresponds to the a simple communication 
approach. First node X coordinates with node Y. Theorem [5] 
for the two-node network declares the minimum rate needed 
to be R\ — log(3/2). After action Y has been established, 
node X specifies action Z in it's entire detail using the rate 
i?2 = log 3. A complementary scheme achieves the extreme 
point B in Figure [75] The sum rate achieved by these points 
is R 1 + R 2 = 2(log 2 3-1/2) bits. 

We can explore more of the inner bound lZp 0t i n (p(y, z\x)) 
by adding the element of time-sharing. That is, use an auxiliary 
variable U that is independent of X. As long as we can assign 
tasks in the network so that X, Y, and Z are each unique, then 
there will be a method of using time-sharing that will achieve 
the desired uniform distribution over unique task assignments 
p. For example, devise six task assignment schemes from the 
one successful scheme by mapping the tasks onto the six 
different permutations of {0,1,2}. By time-sharing equally 
among these six schemes, we achieve the desired distribution. 

With the idea of time-sharing in mind, we achieve a better 
sum rate by restricting the domain of Y to {0, 1} and Z to 
{0, 2} and letting them be functions of X in the following 
way: 

'1, X?l, 
0, X = l, 

2, X^2, 
0, X = 2. 

We can say that Y takes on a default value of 1, and Z takes 
on a default value of 2. Node X just tells nodes Y and Z when 
they need to get out of the way, in which case they switch to 
task 0. To achieve this we only need R\ > H(Y) = log 3 —2/3 
bits and R2 > H(Z) = log 2 3 — 2/3 bits, represented by point 
C in Figure [75] 



Y = 



Z = 



(2) 



(3) 



10 



Finally, we achieve an even smaller sum rate in the inner 
bound TZp 0t i n (p(y 1 z\x)) by using a more interesting choice of 
U in addition to time-sharing^Let U G {0, 1, 2} be correlated 
with X in such a way that they are equal more often than one 
third of the time. Now restrict the domains of Y and Z based 
on U. The actions Y and Z are functions of X and U defined 
as follows: 



Y = 
Z = 



U + 1 mod 3, 

U, 

U — 1 mod 3, 

U, 



X ^ U + 1 mod 3, 
X = U + 1 mod 3, 

X ^ U - 1 mod 3, 
X = U -I mod 3. 



(5) 



This corresponds to sending a compressed description of X, 
represented by U, and then assigning default values to Y and 
Z centered around U. The actions Y and Z sit on both sides 
of U and only move when X tells them to get out of the way. 
The description rates needed for this method are 

Ri > I(X;U) + I(X;Y\U) 

= I{X;U)+H{Y\U). 

R 2 > I(X;U) + I(X:Z\U) 

= I(X;U) + H(Z\U). (6) 

Using a symmetric conditional distribution from X to U, 
calculus provides the following parameters: 



P{U = u\X = x) = 



l 

V5' 



X, 



(7) 
(8) 

where (j) — ^ is the golden ratio. This level of compression 
results in a very low rate of description, I(X; U) ~ 0.04 bits, 
for sending U to each of the nodes Y and Z. 

The description rates needed for this method are as follows, 
and are represented by Point D in Figure \13\ 



Ri > 



I(X;U) + H(Y\U) 
o lo S 5 



Ri 



> 



log 3 
log 3 
log 3 

log 3 

log 3 
log 3 



log 5 



2 



\og(f> 



log cj> + H 



H{Y\U) 
1 



log ( 



1 



0^5 



l0g< 



0^5 
log 



V5 



1\ 1 



V5 



\og(f> 



log' 
log- 



(9) 



where H is the binary entropy function. The above calculation 
is assisted by observing that 0=^ + 1 and <j) + 4 = \/5. 

B. Cascade multiterminal 

We now give bounds on the coordination capacity region for 
the cascade-multiterminal network of Figure[l4] In this setting, 
node X and node Y each have an action specified by nature 

'Time-sharing is also lumped into U, but we ignore that here to simplify 
the explanation. 



according to the joint distribution po(x,y). Node X sends a 
message at rate Ri to node Y. Based on its own action Y and 
the incoming message about X, node Y sends a message to 
node Z at rate R 2 . Finally, node Z produces an action based 
on the message from node Y. 



(4) 


Node X 






























Node Y 




— >- 


Node Z 



z 

Fig. 14. Cascade multiterminal. The actions X and Y are chosen by nature 
according to po(x,y). A message is sent from node X to node Y at rate Ri. 
Node Y then constructs a message for node Z based on the received message 
from node X and its own action. Node Z produces an action based on the 
message it receives from node Y. Bounds on the coordination capacity region 
C P0 are given in Theorem [8] 

The (2 nRl , 2 nR2 , n) coordination codes consist of an en- 
coding function 



i : X n 
a recoding function 

j : {i r l: \ y 

and a decoding function 

z n : {1, 



{!,..., 2"*}, 



{!,..., 2"^} 



} 



The actions X n and Y n are chosen by nature i.i.d. according 
to po(x,y), and the actions Z n are functions of X n and Y n 
given by implementing the coordination code as 

Z n = z n (j(i(X n ),Y n )). 

Node Y is playing two roles in this network. It acts partially 
as a relay to send on the message from node X to node Z, while 
at the same time sending a message about its own actions to 
node Z. This situation applies to a variety of source coding 
scenarios. Nodes X and Y might both be sensors in a sensor 
network, or node Y can be thought of as a relay for connecting 
node X to node Z, with side information Y. 

This network is similar to multiterminal source coding 
considered by Berger and Tung [23 1 in that two sources 
of information are encoded in a distributed fashion. In fact, 
the expansion to accommodate cooperative encoders [18| can 
be thought of as a generalization of our network. However, 
previous work along these lines is missing one key aspect of 
efficiency, which is to partially relay the encoded information 
without changing it. 

Vasudevan, Tian, and Diggavi ll24l looked at a similar 
cascade communication system with a relay. In their setting, 
the relay's information Y is a degraded version of the de- 
coder's side information, and the decoder is only interested 
in recovering X. Because the relay's observations contain no 
additional information for the decoder, the relay does not face 
the dilemma of mixing in some of the side information into 
its outgoing message. In our cascade multiterminal network, 
the decoder does not have side information. Thus, the relay is 
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faced with coalescing the two pieces of information X and 
Y into a single message. Other research involving similar 
network settings can be found in (25), where Gu and Effros 
consider a more general network but with the restriction that 
the action Y is a function of the action X, and 11261 . where 
Bakshi et. al. identify the optimal rate region for lossless 
encoding of independent sources in a longer cascade (line) 
network. 

The set of rate-coordination tuples C Po .in is an inner bound 
on the coordination capacity region, given by 



C 



po,m 



(R 1 ,R 2 ,p(z\x,y)) : 
3p(w, v\x, y, z) such that 
p(x, y, z, u, v) = po(x, y)p{u, v\x)p(z\y, u, v) 
R X >I{X;U,V\Y), 
R 2 >I{X;U) + I(Y,V;Z\U). 

The set of rate-coordination tuples C P0:OU t is an outer bound 
on the coordination capacity region, given by 

(R 1 ,R 2 ,p(z\x,y)) : 
3p(u\x, y, z) such that 
p(x, y, z, u) = p {x, y)p(u\x)p(z\y, u) 

M < \x\\y\\z\, 

Rx>I(X;U\Y), 
R 2 > I(X,Y;Z). 



*-*po,out — 



Also, define lZp 0lin (p(z\x, y)) and Tl Po , out (p(z\x, y)) to be 

3 pairs in C POtin a: 
the desired distribution p(z\x,y). 



the sets of rate pairs in C Po ^ n and C Po>ou t corresponding to 



Theorem 8 (Coordination capacity region bounds). The coor- 
dination capacity region C Po for empirical coordination in the 
cascade multiterminal network of Figure [74] is bounded by 



po ,OUt • 



Discussion: The regions C Po ^ n and C 



po ,out 



are convex. A 



time-sharing random variable can be lumped into the auxil- 
iary random variable U in the definition of C Po ,in to show 
convexity. 

The inner bound C Pn .i n is achieved by dividing the message 
from node X into two parts. One part, represented by U, is 
sent to all nodes, relayed by node Y to node Z. The other 
part, represented by V, is sent only to node Y. Then node Y 
recompresses V along with Y. 

The outer bound C Pn . ou t is a combination of the Wyner- 
Ziv |27| bound for source coding with side information at 
the decoder, obtained by letting node Y and node Z fully 
cooperate, and the two-node bound of Theorem [3] obtained 
by letting node X and node Y fully cooperate. 

For some distributions, the bounds in Theorem [8] are tight 
and the rate-coordination region 1Z P0 = lZ Pa .in = T^ Po ,out- 
This is true for all distributions where X — Y — Z form a 
Morkov chain or Y — X — Z form a Markov chain. In the 
first case, where X — Y — Z form a Morkov chain, choosing 
U = V = in the definition of C Po .i n reduces the region 
to all rate pairs such that R2 > I(Y; Z), which meets the 
outer bound C P0:OU t- In the second case, where Y — X — Z 



form a Morkov chain, choosing U = Z and V = reduces 
the region to all rate pairs such that Ri > I{X\Z\Y) and 
i? 2 > I(X; Z), which meets the outer bound. Therefore, we 
find as special cases that the bounds in Theorem [8] are tight 
if X is a function of Y, if Y is a function of X, or if the 
reconstruction Z is a function of X and Y [28 1. 

Table HT1 shows choices of U and V from TZ P0: i n that yield 
t in each of the above cases. In case 3, V is 



n 



pa, m 



= K 



Vo,ou 



selected to minimize R\ along the lines of 



TABLE II 

Known capacity region (cases where 7^p 0iin = TZ P0: out)- 





Condition 


Auxiliary 


Case 1 
Case 2 
Case 3 




X - Y - Z 
Y - X - Z 
Z = f(X,Y) 


u = 9,v = a> 
u = z.v = 
u = 



Example 5 (Task assignment). Consider again a task as- 
signment setting similar to Example \3\ where three tasks are 
to be assigned without duplication to the three nodes X, Y, 
and Z, and the assignments for nodes X and Y are chosen 
uniformly at random by nature among all pairs of tasks where 
X y^z Y. A distribution capturing this coordination behavior 
is the uniform distribution over the six permutations of task 
assignments. Let po(x,y) be the distributions obtained by 
sampling X and Y uniformly at random from the set {1, 2, 3} 
without replacement, and let p(z\x,y) be the degenerate 
distribution where Z is the remaining unassigned task in 
{1,2,3}. Figure [75] illustrates a valid outcome of the task 
assignments. 







X 



Fig. 15. Task assignment in the cascade multiterminal network. Three tasks, 
numbered 1, 2, and 3, are distributed among three nodes X, Y, and Z in the 
cascade multiterminal network setting. The task assignments for X and Y are 
given randomly by nature but different from each other. What rates i?i and 
R2 are necessary to allow Z to choose a different task from both X and Y? 

Task assignment in the cascade multiterminal network 
amounts to computing a function Z(X, Y), and the bounds 
in Theorem \8\ are tight in such cases. The rate -coordination 
region lZ po (p(z\x,y)) is given by 



Tl PQ (p(z\x,y)) 



(7?i, 7? 2 



Ri > log 2, 
R 2 > log 3. 



This is achieved by letting U — and V — X in the definition 
of C PQl i„. To show that this region meets the outer bound 
C P0 , O ut, make the observation that I(X;U\Y) > I(X; Z\Y) 
in relation to the bound on R\, since X — (Y, U) — Z forms 
a Markov chain. 

V. Strong Coordination 

So far we have examined coordination where the goal is to 
generate Y n through communication based on X n so that the 
joint type Px n .Y n (x,y) is equal to the desired distribution 
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Po(x)p(y\x). This goal relates to the joint behavior at the 
nodes in the network averaged over time. There is no imposed 
requirement that Y n be random, and the order of the sequence 
of the (Xi,Yi) pairs doesn't matter. 

How different does the problem become if we actually 
want the actions at the various nodes in the network to 
be random according to a desired joint distribution? In this 
vein, we turn to a stronger notion of cooperation which 
we call strong coordination. We require that the induced 
distribution over the entire coding block p(x n ,y n ) (induced 
by the coordination code) be close to the target distribution 
p(x n ,y n ) — Y[i=iPo( x i)p(Ui\ x i) — so c l ose that a statistician 
could not tell the difference, based on (X n ,Y n ), of whether 
(X n , Y n ) ~ p(x n ,y n ) or (X™, Y n ) ~ p(x n , y n ). 

Clearly this new strong coordination objective is more 
demanding than empirical coordination — after all, if one were 
to generate random actions, i.i.d. in time, according to the 
appropriate joint distribution, then the empirical distribution 
would also follow suit. But in some settings it is crucial 
for the coordinated behavior to be random. For example, 
in situations where an adversary is involved, it might be 
important to maintain a mystery in the sequence of actions 
that are generated in the network. 

Strong coordination has applications in cooperative game 
theory, discussed in ||30l . Suppose a team shares the same 
payoff in a repeated game setting. An opponent who tries to 
anticipate and exploit patterns in the team's combined actions 
will be adequately combatted by strong coordination according 
to a well-chosen joint distribution. 

A. Problem specifics 

Most of the definitions relating to empirical coordination 
in Section III-AI carry over to strong coordination, including 
the notions of coordination codes and induced distributions. 
However, in the context of strong coordination, achievability 
has nothing to do with the joint type. Here we define strong 
achievability to mean that the distribution of the time-sequence 
of actions in the network is close in total variation to the 
desired joint distribution, i.i.d. in time. We discuss the strong 
coordination capacity region C_ pa , like the region of Definition 
[6] but instead defined by this notion of strong achievability. 

Definition 9 (Strong achievability). A desired distribution 
p(x, y, z) is strongly achievable if there exists a sequence 
of (non-deterministic) coordination codes such that the total 
variation between the induced distribution p(x n ,y n ,z n ) and 
the i.i.d. desired distribution goes to zero. That is, 



p(x n ,y n ,z n )-l[p( 



0. 



TV 



A non-deterministic coordination code is a deterministic 
code that utilizes an extra argument for each encoder and 
decoder which is a random variable independent of all the 
other variables and actions. It seems quite reasonable to allow 
the encoders and decoders to use private randomness during 
the implementation of the coordination code. This allowance 
would have also been extended to the empirical coordination 



framework of sections [TT1 [HI] and II VI however, randomized 
encoding and decoding is not beneficial in that framework 
because the objective has nothing to do with producing random 
actions (appropriately distributed). This claim is similar to 
Theorem[2] Thus, non-deterministic coordination codes do not 
improve the empirical coordination capacity over deterministic 
coordination codes. 

Common randomness plays a crucial role in achieving 
strong coordination. For instance, in a network with no com- 
munication, only independent actions can be generated at 
each node without common randomness, but actions can be 
generated according to any desired joint distribution if enough 
common randomness is available, as is illustrated in Figure [2] 
of Section U In addition, for each desired joint distribution we 
can identify a specific bit-rate of common randomness that 
must be available to the nodes in the network. This motivates 
us to deal with common randomness more precisely. 

Aside from the communication in the network, we allow 
common randomness to be supplied to each node. However, 
to quantify the amount of common randomness, we limit it to 
a rate of i?o bits per action. For an n-block coordination code, 
w is uniformly distributed on the set fi = {1, 2 nR °}. In this 
way, common randomness is viewed as a resource alongside 
communication. 

B. Preliminary observations 

The strong coordination capacity region C pa is not convex 
in general. This becomes immediately apparent when we con- 
sider a network with no communication and without any com- 
mon randomness. An arbitrary joint distribution is not strongly 
achievable without communication or common randomness, 
but any extreme point in the probability simplex corresponds 
to a degenerate distribution that is trivially achievable. Thus 
we see that convex combinations of achievable points in 
the strong coordination capacity region are not necessarily 
strongly achievable, and cannot be achieved through simple 
time-sharing as was done for empirical coordination. 

We use total variation as a measurement of fidelity for 
the distribution of the actions in the network. This has a 
number of implications. If two distributions have a small total 
variation between them, then a hypothesis test cannot reliably 
tell them apart. Additionally, the expected value of a bounded 
function of these random variables cannot differ by much. 
Steinberg and Verdu, for example, also use total variation 
as one of a handful of fidelity criteria when considering the 
simulation of random variables in QTI . On the other hand, 
Wyner used normalized relative entropy as his measurement of 
error for generating random variables in (9J. Neither quantity, 
total variation or normalized relative entropy, is dominated by 
the other in general (because of the normalization). However, 
relative entropy would give infinite penalty if the support of 
the block-distribution of actions is not contained in the support 
of the desired joint distribution. We find cases where the rates 
required under the constraint of normalized relative entropy 
going to zero are unpleasantly high. For instance, lossless 
source coding would truly have to be lossless, with zero error. 

Based on the success of random codebooks in information 
theory and source coding in particular, it seems hopeful that we 
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might always be able to use common randomness to augment 
a coordination code intended for empirical coordination to 
result in a randomized coordination code that achieves strong 
coordination. Bennett et. al. demonstrate this principle for 
the two-node setting with their reverse Shannon theorem 
lfT31 . They use common randomness to generate a random 
codebook. Then the encoder synthesizes a memoryless channel 
and finds a sequence in the codebook with the same joint 
type as the synthesized output. Will methods like this work in 
other network coordination settings as well? The following 
conjecture makes this statement precise and is consistent 
with both networks considered for strong coordination in this 
section of the paper. 

Conjecture 1 (Strong meets empirical coordination). With 
enough common randomness, for instance if to ~ Unif{[0, 1]}, 
the strong coordination capacity region is the same as the em- 
pirical coordination capacity region for any specific network 
setting . That is, 

With unlimited common randomness: C_ pQ = C Po . 

If Conjecture Q] is true, then results regarding empirical co- 
ordination should influence strong coordination schemes, and 
strong coordination capacity regions will reduce to empirical 
coordination capacity regions under the appropriate limit. 

C. No communication 

Here we characterize the strong coordination capacity region 
C for the no communication network of Figure[l6] A collection 
of nodes X, Y, and Z generate actions according to the 
joint distribution p(x,y,z) using only common randomness 
(and private randomization). The strong coordination capacity 
region characterizes the set of joint distributions that can be 
achieved with common randomness at a rate of Rq bits per 
action. 

Y 

X 

|fi| = 2 nR ° 

Z 
• 

Fig. 16. No communication. Three nodes generate actions X, Y, and 
Z according to p(x, y, z) without communication. The rate of common 
randomness needed is characterized in Theorem [9] 

Wyner considered a two-node setting in [9|, where cor- 
related random variables are constructed based on common 
randomness. He found the amount of common randomness 
needed and named the quantity "common information." Here 
we extend that result to three nodes, and the conclusion for 
any number of nodes is immediately apparent. 

The n-block coordination codes consist of three non- 
deterministic decoding functions, 

x n : {l,...,2 nRo } — -> X n , 

y n : {l,...,2™ i?0 } — 

z n : {l,...,2 nRo } — ■» Z n . 



Each function can use private randomization to probabilisti- 
cally map the common random bits lu to action sequences. 
That is, the functions x n (ui), y n (uj), and z n (uj) behave ac- 
cording to conditional probability mass functions p{x n \ui), 
p(y n \uj), and p(z n \uj). 

The rate region given in Theorem [9] can be generalized to 
any number of nodes. 

Theorem 9 (Strong coordination capacity region). The strong 
coordination capacity region C_ for the no communication 
network of Figure [76] is given by 

!p(x,y,z) : 3p(u\x,y, z) such that 
p(x, y, z, u) = p{u)p(x\u)p(y\u)p(z\u) 
\u\<\x\\y\\z\, 
Ro>I(X,Y,Z;U). 

Discussion: The proof of Theorem [9] sketched in Section 
IVHI follows nearly the same steps as Wyner's common 
information proof. This generalization can be interpreted as 
a proposed measurement of common information between a 
group of random variables. Namely, the amount of common 
randomness needed to generate a collection of random vari- 
ables at isolated nodes is the amount of common information 
between them. However, it would also be interesting to con- 
sider a richer problem by allowing each subset of nodes to have 
an independent common random variable and investigating all 
of the rates involved. 

Example 6 (Task assignment). Suppose there are tasks num- 
bered 1, k, and three of them are to be assigned randomly 
to the three nodes X, Y, and Z without duplication. That 
is, the desired distribution p{x, y, z) for the three actions in 
the network is the distribution obtained by sampling X, Y, 
and Z uniformly at random from the set {l,...,fc} without 
replacement. The three nodes do not communicate but have 
access to common randomness at a rate of Rq bits per 
action. We want to determine the infimum of rates R required 
to strongly achieve p{x, y, z). Figure [7_7| illustrates a valid 
outcome of the task assignments. 



k>6: 




Z 



Fig. 17. Random task assignment with no communication. A task from a set 
of tasks numbered 1 , . . . , k is to be assigned randomly but uniquely to each 
of the nodes X, Y, and Z without any communication between them. The rate 
of common randomness needed to accomplish this is roughly Rq > 3 log 3 
for large k. 

Theorem [9] tells us which values of Rq will result in 
p(x, y, z) € C. We must optimize over distributions of an 
auxiliary random variable U. Two things come in to play 
to make this optimization manageable: The variables X, Y, 
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and Z are all conditionally independent given U; and the 
distribution p has sparsity. For any particular value of U, 
the conditional supports of X, Y, and Z must be disjoint. 
Therefore, 



I(X,Y,Z;U) 



> 



H(X,Y,Z) 
H{X,Y,Z) 
H(X,Y,Z) 



H(X,Y,Z\U) 
E[H(X,Y,Z\U = «)] 
E [log(ki tU k 2 ,uk3,u)] , 



where ki,u, kijj, and k^^jj are integers that sum to k for all 
U. Therefore, we maximize \og{k\^uk2,uk i i t u) by letting the 
three integers be as close to equal as possible. Furthermore, 
it is straightforward to find a joint distribution that meets this 
inequality with equality. 

If k, the number of tasks, is divisible by three, then we see 
that p(x,y,z) G C_for values of Rq > 3 log 3 — log(-^j-) — 
log(jp2). No matter how large k is, the required rate never 
exceeds Rq > 3 log 3. 

D. Two nodes 

We can revisit the two-node network from Section ITlI- Al and 
ask what communication rate is needed for strong coordina- 
tion. In this network the action at node X is specified by nature 
according to Po(x), and a message is sent from node X to node 
Y at rate R. Common randomness is also available to both 
nodes at rate Rq. The common randomness is independent of 
the action X. 



X ~p (x) 

Node X 




NodeY inl =2 nR ° 



Fig. 18. Two nodes. The action at node X is specified by nature according 
to po(%), and a message is sent from node X to node Y at rate R. 
Common randomness is also available to both nodes at rate Rq . The common 
randomness is independent of the action X. The strong coordination capacity 
region C depends on the amount of common randomness available. With no 
common randomness, C contains all rate-coordination pairs where the rate 
is greater than the common information between X and Y. With enough 
common randomness, C contains all rate-coordination pairs where the rate is 
greater than the mutual information between X and Y. 



The rates Ro and R required for strong coordination in 
the two-node network are characterized in ll30l and were 
independently discovered by Bennett et. al. [32| in the context 
of synthesizing a memoryless channel. Here we take particular 
note of the two extremes: what is the strong coordination 
capacity region when no common randomness is present, and 
how much common randomness is enough to maximize the 
strong coordination capacity region? 
\n) 

deterministic encoding function, 



The (2 nR , n) coordination codes consist of a non- 



and a non-deterministic decoding function, 
y n : {!,..., 2 nfl } X {!,..., 2 nR °} 



y r - 



Both functions can use private randomization to probabilis- 
tically map the arguments onto the range of the function. 
That is, the encoding function i(x n ,uj) behaves according to 
a conditional probability mass function p(i\x n ,uj), and the 
decoding function y n (i,ui) behaves according to a conditional 
probability mass function p(y n \i, ui). 

The actions X n are chosen by nature i.i.d. according to 
Po(x), and the actions Y n are constructed by implementing 
the non-deterministic coordination code as 



y n (i(X n ,oj),oj) 



Let us define two quantities before stating the result. The 
first is Wyner's common information C{X\Y) j9], which 
turns out to be the communication rate requirement for strong 
coordination in the two-node network when no common 
randomness is available: 



C(X-Y) 



mm 

x-u- 



,I(X,Y;U), 



where the notation X — U — Y represents a Markov chain 
from X to U to Y . The second quantity we call necessary 
conditional entropy H(Y]X), which we will show to be 
the amount of common randomness needed to maximize the 
strong coordination capacity region in the two-node network: 

H(Y\X) = min H(f(Y)\X). 

V ' f:X-f(Y)-Y wv 71 ; 

Theorem 10 (Strong coordination capacity region). With 
no common randomness, Rq — 0, the strong coordination 
capacity region C po for the two-node network of Figure [7S] 
is given by 



C, 



{(R,p(y\x)) : R>C(X;Y)}. 



On the other hand, if and only if the rate of common ran- 
domness is greater than the necessary conditional entropy, 
Rq > H{Y\X), the strong coordination capacity region C po 
for the two-node network of Figure [7S] is given by 



c P0 



{(R,p(y\x)) : R>I(X;Y)}. 



X n x {!,..., 2 nHo } {!,..., 2 nH } 



Discussion: The proof of TheoremflOl found in Section IVTll 
is an application of Theorem 3.1 in If30l . This theorem is con- 
sistent with Conjecture [Q — with enough common randomness, 
the strong coordination capacity region C po is the same as the 
coordination capacity region C Pa found in Section UlI-AI 

For many joint distributions, the necessary conditional en- 
tropy H{Y\X) will simply equal the conditional entropy 
H(Y\X). 

Example 7 (Task assignment). Consider again a task assign- 
ment setting similar to Example [6] where tasks are numbered 
1, k and are to be assigned randomly to the two nodes X 
and Y without duplication. The action X is supplied by nature, 
uniformly at random (pq{x)), and the desired distribution 
p(y\x) for the action Y is the uniform distribution over all 
tasks not equal to X. Figure [79] illustrates a valid outcome of 
the task assignments. 
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k > 7 : 



X R Y 



01 = 2""" 



Fig. 19. Task assignment in the two-node network. A task from a set of 
tasks numbered 1 . . . . , k is to be assigned randomly but uniquely to each of 
the nodes X and Y in the two-node network. The task assignment for X 
is given by nature. Common randomness at rate Rq is available to both 
nodes, and a message is sent from node X to node Y at rate R. When 
no common randomness is available, the required communication rate is 
R > 2 — log(|ij) bits (for even k). At the other extreme, if the rate 

of common randomness is greater than log(fc — 1), then R > log( u _i ) 
suffices. 



To apply Theorem [70] we must evaluate the three quantities 
I(X:Y), C(X;Y), and H{Y]X). For the joint distribution 
Po(x)p(y\x), the necessary conditional entropy H{Y\X) is 
exactly the conditional entropy H(Y\X). The computation of 
the common information C(X;Y) follows the same steps as 
the derivation found in Example [6] Let \k~\ take the value of 
k rounded up to the nearest even number. 



I{X-Y) 

C(X;Y) 
H{Y]X) 



log 



2 bits — log 
log(fc-l). 



( r*i 



Vjfel-i 



Without common randomness, we find that the communica- 
tion rate R > 2 bits — log y 1 ^ is necessary to strongly 
achieve po(x)p(y\x). The strong coordination capacity region 
C_p Q expands as the rate of common randomness Rq increases. 
Additional common randomness is no longer useful when 
Rq > log(fc — 1). With this amount of common randomness, 
only the communication rate R > log(j^j) is necessary to 
strongly achieve po(x)p(y\x). 

VI. Rate-distortion Theory 

The challenge of describing random sources of information 
with the fewest bits possible can be defined in a number of dif- 
ferent ways. Traditionally, source coding in networks follows 
the path of rate-distortion theory by establishing multiple dis- 
tortion penalties for the multiple sources and reconstructions 
in the network. Yet, fundamentally, the rate-distortion problem 
is intimately connected to empirical coordination. 

The basic result of rate-distortion theory for a single mem- 
oryless source states that in order to achieve any desired 
distortion level you must find an appropriate conditional 
distribution of the reconstruction X given the source X 
and then use a communication rate larger than the mutual 
information I(X:X). This lends itself to the interpretation 
that optimal encoding for a rate-distortion setting really comes 
down to coordinating a reconstruction sequence with a source 
sequence according to a selected joint distribution. Here we 
make that observation formal by showing that in general, even 
in networks, the rate-distortion region is a projection of the 
coordination capacity region. 



The coordination capacity region C Po is a set of rate- 
coordination tuples. We can express rate-coordination tu- 
ples as vectors. For example, in the cascade network of 
Section IIII-CI there are two rates Ri and R 2 . The actions 
in this network are X, Y, and Z, where X is given 
by nature. Order the space X x y x Z in a sequence 
(x 1 ,y 1 ,z 1 ), ...,(x mi y m ,z m ), where m = The 
rate-coordination tuples (i?i, R 2 ,p(y, z\x)) can be expressed 
as vectors [R i: R 2 ,p(yi, Zi\xi), ...,p(y m , z m \x m )] T ■ 

The rate-distortion region T> Po is the closure of the set of 
rate-distortion tuples that are achievable in a network. We say 
that a distortion D is achievable if there exists a rate-distortion 
code that gives an expected average distortion less than D, 
using d as a distortion measurement. For example, in the 
cascade network of Section llH-Cl we might have two distortion 
functions: The function d\(x, y) measures the distortion in the 
reconstruction at node Y; the function d 2 (x,y,z) evaluates 
distortion jointly between the reconstructions at nodes Y and 
Z. The rate-distortion region T> Po would consist of tuples 
{R\, R 2 , D\, D 2 ), which indicate that using rates R\ and R2 
in the network, a source distributed according to po(x) can 
be encoded to achieve no more than D\ expected average 
distortion as measured by d\ and D2 distortion as measured 
by d 2 . 

The relationship between the rate-distortion region T> po 
and the coordination capacity region C Pa is that of a linear 
projection. Suppose we have multiple finite-valued distortion 
functions d\,...,dk. We construct a distortion matrix D using 
the same enumeration (xi, yi, Zi), (x m , y m , z m ) of the 
space X x y x Z as was used to vectorize the tuples in C Po : 



D ^ 



di (xi,yi,zi)p (xi) 



d k (x 1 ,y 1 ,zi)p (xi 



d\ (x m , ym ; Zm )po [Xm ) 



dk (Xm , ym : Zm )P0 (x Ti 



The distortion matrix D is embedded in a block diagonal 
matrix A where the upper-left block is the identity matrix / 
with the same dimension as the number of rates in the network: 



A 4 




D 



Theorem 11 (Rate-distortion region). The rate-distortion re- 
gion T> po for a memoryless source with distribution po in any 
rate-limited network is a linear projection of the coordination 
capacity region C Po by the matrix A, 

'Dpo = C Po ■ 

We treat the elements ofT> po and C Po as vectors, as discussed, 
and the matrix multiplication by A is the standard set multi- 
plication. 

Discussion: The proof of Theorem QT| can be found in 
Section IVHI Since the coordination capacity region C Pa is a 
convex set, the rate-distortion region T> Po is also a convex set. 

Clearly we can use a coordination code to achieve the 
corresponding distortion in a rate-distortion setting. But the 
theorem makes a stronger statement. It says that there is 
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not a more efficient way of satisfying distortion limits in 
any network setting with memoryless sources than by using 
a code that produces the same joint type for almost every 
observation of the sources. It is conceivable that a rate- 
distortion code for a network setting would produce a variety 
of different joint types, each satisfying the distortion limit, but 
varying depending on the particular source sequence observed. 
However, given such a rate-distortion code, repeated uses 
will produce a longer coordination code that consistently 
achieves coordination according to the expected joint type. 
The expected joint type of a good rate-distortion code can be 
shown to satisfy the distortion constraints. 




P(Y = \\X = 0) 

Fig. 20. Coordination capacity and rate-distortion. The coordination-rate 
region for a uniform binary source X and binary action Y, where X is 
described at rate R = 0.1 bits to node Y in the two-node network. The shaded 
region shows distributions with Hamming distortion less than D, where D is 
chosen to satisfy R(D) =0.1 bits. 

Geometrically, each distortion constraint defines a hyper- 
plane that divides the coordination-rate region into two sets — 
one that satisfies the distortion constraint and one that does 
not. Therefore, minimizing the distortion for fixed rates in 
the network amounts to finding optimal extreme points in the 
coordination-rate region in the directions orthogonal to these 
hyperplanes. Figure [20] shows the coordination-rate region for 
R = 0.1 bits in the two-node network of Section ITlI-AI with a 
uniform binary source X and binary Y. The figure also shows 
the region satisfying a Hamming distortion constraint D. 

VII. Proofs 

A. Empirical Coordination - Achievability ( Sections [££/] [TV} 

For a distribution p(x), define the typical set 7e with 
respect to p(x) to be sequences x n whose types are e-close to 
p(x) in total variation. That is, 

T} n) 4 {x n eX n : \\P x n(x)-p(x)\\ TV <e}.(l0) 

This definition is almost the same as the definition of the 
strongly typical set found in (10.106) of Cover and 

Thomas [33], and it shares the same important properties. 
The difference is that here we give a total variation constraint 
(Li distance) on the type of the sequence rather than an 
element-wise constraint (L^ distance)]! We deal with 7^ 

2 Additionally, our definition of the typical set handles the zero probability 
events more liberally, but this doesn't present any serious complications. 



since it relates more closely to the definition of achievability 
in Definition [5] However, the sets are almost the same, as the 
following sandwich suggests: 

A*(n) r r (n) Mn) 

A jointly typical set with respect to a joint distribution 
p(x, y) inherits the same definition as (fTO} , where total vari- 
ation of the type is measured with respect to the joint distri- 
bution. Thus, achieving empirical coordination with respect to 
a joint distribution is a matter of constructing actions that are 
t-jointly typical (i.e. in the jointly typical set T^) with high 
probability for arbitrary e. 

1 ) Strong Markov Lemma: If X — Y — Z form a Markov 
chain, and the pair of sequences x n and y n are jointly 
typical as well as the pair of sequences y n and z n , it is not 
true in general that the three sequences x n , y n , and z n are 
jointly typical as a triple. For instance, consider any triple 
(x n ,y n ,z n ) that is jointly typical with respect to a non- 
Markov joint distribution having marginal distributions p{x, y) 
and p(y, z). However, the Markov Lemma 11231 states that if 
Z n is randomly distributed according to J\7=i P( z i\Vi)> men 
with high probability it will be jointly typical with both x n and 
y n . This lemma is used to establish joint typicality in source 
coding settings where side information is not known to the 
encoder. Yet, for a network and encoding scheme that is more 
intricate, the standard Markov Lemma lacks the necessary 
strength. Here we introduce a generalization that will help 
us analyze the layers of "piggy-back"-style codes [34| used in 
our achievability proofs Q 

Theorem 12 (Strong Markov Lemma). Given a joint distribu- 
tion p(x, y, z) on the finite alphabet X x y x Z that yields a 
Markov chain X — Y — Z (i.e. p(x,y,z) — p(y)p(x\y)p(z\y)j, 
let x n and y n be arbitrary sequences that are e-jointly typical. 
Suppose that Z n is randomly chosen from the set of z n se- 
quences that are e-jointly typical with y n and additionally that 
the distribution of Z n is permutation-invariant with respect to 
y n , which is to say, any two sequences z n and z n of the same 
joint type with y n have the same probability. That is, 

P yn , z n=P yn<S n => P(Z n =Z n )=P(Z n =~Z n ). (11) 

Then, 

p r ((x n , y n ,z n )e7i: ] ) > f n , 

where £„ — > 1 exponentially fast as n goes to infinity. 

Notice that permutation invariance is a condition satisfied 
by most random codebook based proof techniques — for in- 
stance, encoding schemes based on i.i.d. codebooks tend to 
be permutation invariant. To recover the familiar Markov 
Lemma, let Z n have a distribution based on y n according 
to n"=i P( z i\Vi)> where y n is an e-typical sequence. Due to 
the A.E.P., y n and Z n will be 2e-jointly typical with high 
probability. Furthermore, Theorem [12] can be invoked because 
the distribution is permutation invariant. 

^Through conversation we discovered that similar effort is being made by 
Young-Han Kim and Abbas El Gamal and may soon be found in the Stanford 
EE478 Lecture Notes. 
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The key to proving Theorem Q~2] is found in Lemma [13] 
which uses permutation invariance and counting arguments to 
show that most realizations look empirically Markov. 

Lemma 13 (Markov Tendency). Let x n 6 X n and y n € y n 
be arbitrary sequences. Suppose that the random sequence 
Z n £ Z n has a distribution that is permutation-invariant with 
respect to y n , as in Then with high probability which 

only depends on the sizes of the alphabets X, y, and Z, the 
joint type P x n ,y^,z n will be e-close to the Markov joint type 
P x ™,y"-Pz n \y n - That is, for any e > 0, 

\\Px",y n ,Z" — Px n ,y n Pz n \y" \\ T y < e : (12) 

with a probability of at least 1 — 2~ cm +' 31o s Tl ) where a and 
/3 only depend on the alphabet sizes and e. 

Proof of Theorem 172} The proof of Theorem Q~2] re- 
lies mainly on Lemma [13] and repeated use of the triangle 
inequality. From Lemma [13] we know that with probability 
approaching one as n tends to infinity, inequality (fT2b is 
satisfied, namely, 

\\Px",y™,Z™ - Px™,y"Pzi\ y n\\ TV < €. 

In this event, we now show that 

(x n ,y n ,Z n ) e T 4 ( £ n) . 
By the definition of total variation one can easily show that 

\\Px»,y»Pz»\y» - PX,yPz^\ v AWv 

= \\Px n ,y« - PX,y\\tV 

< e. 

Similarly, 

\\PyPx\yPz»\v» ~ P v «Px\yPz™\ v A\tv 

= WPY-PyAWv 

< e. 

And finally, 

\\Py",Z"Px\Y ~ PX,Y,z\Wv 

= \\Py>\Z» -PY,z\\TV 

< e. 

Thus, the triangle inequality gives 

\\P X '\y\z™ - Px,y,z\\tv < 4e. 

■ 

Proof of Lemma [73} We start by defining two constants 
that simplify this discussion. The first constant, a, is the key 
to obtaining the uniform bound that Lemma [T3l provides. 

a = min I(X;Z\Y), 

p(x,y,z)eSx,y,z : \\p(x,y,z)-p(x,y)p(z\y)\\ T v>t 

/3 4 2|^||y||Z|. 

Here Sxy.z is the simplex with dimension corresponding to 
the product of the alphabet sizes. Notice that a is defined as 
a minimization of a continuous function over a compact set; 
therefore, by analysis we know that the minimum is achieved 
in the set. Since I{X; Z\Y) is positive for any distribution that 



does not form a Markov chain X — Y — Z, we find that a is 
positive for e > 0. The constants a and f3 are functions of e 
and the alphabet sizes \X\, \y\, and \Z\. 

We categorize sequences into sets with the same joint type. 
The type class T p ( y ^ is defined as 

T p(y , z) ^ : P yn , z n =p(y,z)}. 

We also define a conditional type class T p ( z i y )(y n ) to be the 
set of z n sequences such that the pair (y n , z n ) are in the type 
class T p (y jt y Namely, 

Tp (z]y) {y n ) ^ {z n : P y n, z n =p(z\y)P y „}. 

We will show that the statement made in dT2b is true 
conditionally for each conditional type class T p ^ z \ y \ ) (y n ) and 
therefore must be true overall. 

Suppose Z n falls in the conditional type class Tp s „ |a „ (y n ). 
By assumption (TTTb . all z n in this type class are equally likely. 
Assessing probabilities simply becomes a matter of counting. 
From the method of types [ 33 1 we know that 

\T P ^ yn (y n )\ > n-\ y \\ z \2 nHp y-^ Y \ 

We also can bound the number of z n sequences in 
Tp Mn , n {y n ) that do not satisfy ( TLZb . These sequences must 
fall in a conditional type class Tp_ n ,„ yn (x n , y n ) where 

H-Pz'Str.Z™ — Px n ,y n Pz n \y n \\ TV ^ e - 

For each such type class, the size can be bounded by 

\T P ^ vn (x", y «)\ < 2 n HPxn , v ^(m,y) 

= 2 n(H Pyn: . n (Z\Y)-I Ptonyn . n (X;Z\Y)) 
< 2 n(H Pyntin (Z\Y)-a) 

Furthermore, there are only polynomially many types, 
bounded by n 1 ^ 11 ^ 1121 . Therefore, the probability that Z n does 
not satisfy dT~2l > for any conditional type P 2 n|j,™ is bounded by 

Pr(not(H I Z n eT P ^ yn (y n )) 

|{z»eT P ,„ |y „(y") : not jgj}] 

\T P ^ yn {y n )\ 

n \X\\y\\Z\ 2 n ( H P y n^n{Z\Y)- a ) 

~ n -\y\\ z \2 nHp y"^ (ZlY) 
= n \y\\z\+\x\\y\\z\ 2 - a n 

^ 2 _ Q!n+/3 log n 

■ 

2) Generic Achievability Proof: The coding techniques for 
achieving the empirical coordination regions in Sections [III] 
and [IV] are familiar from rate distortion theory. For the proofs, 
we construct random codebooks for communication and show 
that the resulting encoding schemes perform well on average, 
producing jointly-typical actions with high probability. This 
proves that there must be at least one deterministic scheme that 
performs well. Here we prove one generally useful example to 
verify that the rate-distortion techniques actually do work for 
achieving empirical coordination. The technique here is very 
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similar to the source coding technique of "piggy-back" codes 
introduced by Wyner (34). 

Consider the two-node source coding setting of Figure [21] 
with arbitrary sequences x n , y n , and z n that are e-jointly 
typical according to a joint distribution p(x,y,z). The se- 
quences x n and y n are available to the encoder at node 1, 
while y n and z n are available to the decoder at node 2. We 
can think of x n as the source to be encoded and y n and z n 
as side information known to either both nodes or the decoder 
only, respectively. Communication from node 1 to node 2 at 
rate R is used to produce a sequence U". Original results 
related to this setting in the context of rate-distortion theory 
can be found in the work of Wyner and Ziv [27|. Here we 
analyze a randomized coding scheme that attempts to produce 
a sequence U n at the decoder such that (x n , y n , z n , U n ) 
are (8e)-jointly typical with respect to a joint distribution of 
the form p(x, y, z)p(u\x, y). We give a scheme that uses a 
communication rate of R > I(X;U\Y, Z) and is successful 
with probability approaching one as n tends to infinity for all 
jointly typical sequences x n , y n , and z n . 



x n ,y n y n ,z n 
/ 6 \2 nR ] 

Node 1 J — Node 2 



there exists a sequence of randomized coordination codes at 
rate R for which 



Fig. 21. Two nodes with side information. This network represents a generic 
source coding setting encountered in networks and will illustrate standard 
encoding techniques. The sequences x n , y n , and z n are jointly typical with 
respect to po(x, ?/i z ). Only x n and y n are observed by the encoder at node 
1 . A message is sent to specify U n to node 2 at rate R. A randomized coding 
scheme can produce U n to be jointly typical with (x n , y n , z n ) with respect 
to a Markov chain Z — (X, Y) — U with high probability, regardless of the 
particular sequences x n , y n , and z n , as long as the rate is greater than the 
conditional mutual information I(X; U\Y,Z). 

The (2 nR ,n) coordination codes consist of a randomized 
encoding function 

i : x n x y n x n — > {l,...,2 nfl }, 

and a randomized decoding function 



{!,..., 2™ K } xfx2"x!! 



These functions are random simply because the common 
randomness w is involved for generating random codebooks. 

The sequences x n ,y n , and z n are arbitrary jointly typical 
sequences according to po(x, y, z), and the sequence U n is a 
randomized function of x n ,y n , and z n given by implementing 
the coordination code as 

U n = u n (i(x n ,y n ,u } ),y n ,z n ,ij). 

Lemma 14 (Generic Coordination with Side Information). For 
the two-node network with side information of Figure [27] and 
any discrete joint distribution of the form p(x,y, z)p(u\x,y), 
there exists a function 8(e) which goes to zero as e goes to zero 
such that, for any e > and rate R > I(X; U\Y, Z) + (5(e), 



Pr 



((**,l,V n ,EP)€7#j) 



1 



(n) 



as n goes to infinity, uniformly for all (x n , y n , z n ) G Te 

Proof: Consider a joint distribution p(x,y, z)p(u\x,y) 
and define 7 to be the excess rate, 7 = R—I(X; U\Y, Z). The 
conditions of Lemma [141 require that 7 > 6(e) for some 6(e) 
that goes to zero as e goes to zero. We will identify a valid 
function 6(e) at the conclusion of the following analysis. 

We first over-cover the typical set of (x n ,y n ) using a 
codebook of size 2 nR °, where R c = I(X,Y; U) + 7/2. We 
then randomly categorize the codebook sequences into 2 nR 
bins, yielding roughly 2 nRb sequences in each bin, where 

Rb = Rc ~ R 

= I(X,Y;U)-I(X;U\Y,Z)- 7 /2 

= I(X,Y,Z;U)-I(X;U\Y,Z)- 7 /2 

= I(Y,Z;U)- 1 J2. 

Codebook: Using u>, generate a codebook C of 2 nRa se- 
quences u n (j) independently according to the marginal distri- 
bution p(u), namely Y[i=iP( u i)- Randomly and independently 
assign each one a bin number b(u n (j)) in the set {1, 2 nR }. 

Encoder: The encoding function i(x n ,y n ,us) can be ex- 
plained as follows. Search the codebook C and identify an 
index j such that (x n ,y n ,u n 

CO) G T^f- If multiple exist, 
select the first such j. If none exist, select j = 1. Send the 
bin number i(x n ,y n ,uj) = b(u n (j)). 

Decoder: The decoding function u n (i,y n ,z n ,u) can be 
explained as follows. Consider the codebook C and identify 
an index j such that (y n , z n , u n (j)) € 7^" and b(u n (j)) = i. 
If multiple exist, select the first such j. If none exist, select 
j = 1. Produce the sequence U n = u n (j). 

Error Analysis: We conservatively declare errors for any of 
the following, E\, E2, or £"3. 

Error 1: The encoder does not find a (2e)-jointly typical 
sequence in the codebook. By the method of types one can 
show, as in Lemma 10.6.2 of [ 33 1, that each sequence in C is 
(2e)-jointly typical with (x n , y n ) with probability greater than 
2~n(i(x,Y-u)+s 1 (e)) f QJ . n j^-gg enough, where 61(e) goes to 

zero as e goes to zero. 

Each sequence in the codebook C is generated indepen- 
dently, so the probability that none of them are jointly typical 
is bounded by 



Pr(-Bi) < (i-2-^^ X ' Y ^ +s ^) 2 " Rc 
< 



iR co - n (I(X,Y;U) + 6 1 (,)) 



= e 1 

Error 2: The sequence identified by the encoder is not 
(8e)-jointly typical with (x n ,y n , z n ). Assuming E\ did not 
occur, because of the Markovity Z — (X, Y) — U implied 
by p(x, y, z)p(u\x, y) and the symmetry of our codebook 
construction, we can invoke Theorem Q~2] to verify that the 
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conditional probability Pr(E<z\Ef) is arbitrarily small for large 
enough n. 

Error 3: The decoder finds more than one eligible action 
sequence. Assume that E\ and E2 did not occur. If the decoder 
considers the same index j as the encoder selected, then 
certainly u n (j) will be be eligible, which is to say it will 
be (8e)-jointly typical with (y n ,z n ), and the bin index will 
match the received message. For all other sequences in the 
codebook C, an appeal to the property of iterated expectation 
indicates that the probability of eligibility is slightly less than 
the a priori probability that a randomly generated sequence 
and bin number will yield eligibility (had you not known that 
it was not the sequence selected by the encoder), which is 
upper bounded by 2- nR 2- n( - I( - Y ' Z ^- s ^ . Therefore, by the 
method of types and the union bound, 

Pr(E 3 \E^E%) < 2 nRc 2- nR 2- n( - I( - Y < z '< U) - S2(e)) 

2-n(R-R c +I{Y,Z-U)-& 2 (f-)) 

_ 2- n ( I( . Y - z ' u )~ R i—s 2 (e)) 

_ 2 -»(7/2-52(e))_ 

Thus we can select <5(e) = max{2<5i(e), 262(e), 8e} to make 
all error terms go to zero and satisfy the lemma. ■ 

With the result of Lemma [14] in mind, we can confidently 
talk about using communication to establish coordination of 
sequences across links in a network. Throughout the following 
explanations we will no longer pay particular attention to the 
e in the e-jointly typical set. Instead, we will simply make 
reference to the generic jointly typical set, with the assumption 
that e is sufficiently small and n is sufficiently large. 

3) Two nodes - Theorem [3} It is clear from Lemma [14] 
that an action sequence Y n jointly typical with X n can be 
specified with high probability using any rate R > I(X;Y). 
With high probability X n will be a typical sequence. Apply 
Lemma [H with Y = Z = 0. 

4) Isolated node - Theorem [?] No proof is necessary, as 
this is a special case of the cascade network with R2 = 0. 

5) Cascade - Theorem^ The cascade network of Figure 
[8] has a sequence X n given by nature. The actions X n 
will be typical with high probability. Consider the desired 
coordination p(y, z\x). A sequence Z n can be specified with 
rate R z > I(X;Z) to be jointly typical with X n . This 
communication is sent to node Y and forwarded on to node 
Z. Additionally, now that every node knows Z n , a sequence 
Y n can be specified with rate Ry > I(X;Y\Z) and sent to 
node Y. The rates used are i?i = R Y + Rz > I(X;Y, Z) and 
R 2 = Rz> I(X; Z). 

Ri = Ry + Rz > I(X;Y,Z), 
R 2 = Rz > I(X;Z). 

6) Degraded source - Theorem [6} The degraded source 
network of Figure [10] has a sequence X n given by nature, 
known to node X, and another sequence Y n , which is a letter- 
by-letter function of X n , known to node Y. Incidentally, Y n is 
also known to node X because it is a function of the available 
information. The actions X n and Y n will be jointly typical 
with high probability. 



Consider the desired coordination p(z\x,y) and choose a 
distribution for the auxiliary random variable p(u\x,y,z) to 
help achieve it. The encoder first specifies a sequence U n 
that is jointly typical with X n and Y n . This requires a rate 
Ru > I(X, Y; U) = I(X] U), but with binning we only need 
a rate of Ri > I(X\U\Y) to specify U n from node X to 
node Y. Binning is not used when U n is forwarded to node 
Z. Finally, after everyone knows U n , the action sequence Z n 
jointly typical with X n , Y n , and U n is specified to node Z 
at a rate of R 2 > I(X, Y; Z\U) = I(X; Z\U). Thus, all rates 
are achievable which satisfy 

R x > I{X;U\Y), 
R 2 > I{X;Z\U), 
R 3 = Ru > I{X;U). 

7) Broadcast - Theorem^}; The broadcast network of Figure 
Q~T]has a sequence X n given by nature, known to node X. The 
action sequence X n will be typical with high probability. 

Consider the desired coordination p(y, z\x) and choose a 
distribution for the auxiliary random variable p(u\x,y,z) to 
help achieve it. We will focus on achieving one corner point 
of the pentagonal rate region. The encoder first specifies a 
sequence U n that is jointly typical with X n using a rate 
Ru > I(X;U). This sequence is sent to both node Y and 
node Z. After everyone knows U n , the encoder specifies an 
action sequence Y n that is jointly typical with X n and U n 
using rate Ry > I(X; Y\U). Finally, the encoder at node X, 
knowing both X n and Y n , can specify an action sequence 
Z n that is jointly typical with (X n ,Y n ,U n ) using a rate 
Rz > I(X,Y; Z\U). This results in rates 

Ri = Ru + Ry > I{X;U) + I(X;Y\U) = I(X;U,Y), 
R 2 = Ru+Rx > I(X;U)+I(X,Y;Z\U). 

8) Cascade multiterminal - Theorem^ The cascade mul- 
titerminal network of Figure [14] has a sequence X n given by 
nature, known to node X, and another sequence Y n given by 
nature, known to node Y. The actions X n and Y n will be 
jointly typical with high probability. 

Consider the desired coordination p(z\x,y) and choose 
a distribution for the auxiliary random variables U and 
V according to the inner bound in Theorem [8] That is, 
p(x,y, z,u,v) — p(x,y)p(u,v\x)p(z\y,u,v). We specify a 
sequence U n to be jointly typical with X n . By the Strong 
Markov Lemma (Theorem fT2b. in conjunction with the sym- 
metry of our random coding scheme and the Markovity of the 
distribution p(x,y)p(u\x), the sequence U n will be jointly 
typical with the pair (X n ,Y n ) with high probability. Using 
binning, we only need a rate of Ru.i > I(X; U\Y) to specify 
U n from node X to node Y (as in Lemma fT4T>. However, we 
cannot use binning for the message to node Z, so we send 
the index of the codework itself at a rate of Ru, 2 > U). 
Now that everyone knows the sequence U n , it is treated as 
side information. 

A second auxiliary sequence V n is specified from node 
X to node Y to be jointly typical with (X n ,Y n ,U n ). This 
scenario coincides exactly with Lemma [14] and a sufficient 
rate is Ry > I(X;V\U,Y). Finally, an action sequence Z n 
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is specified from node Y to node Z to be jointly typical 
with (Y n ,V n ,U n ), where U" is side information known 
to the encoder and decoder. We achieve this using a rate 
Rz > I(Y, V; Z\U). Again, because of the symmetry of our 
encoding scheme, the Strong Markov Lemma (Theorem 112b 
tells us that (X n , Y n , U n , V n , Z n ) will be jointly typical, and 
therefore, {X n , Y n , Z n ) will be jointly typical. 
The rates used by this scheme are 

Ri_ = R Uil + R v > I(X;U,V\Y), 

R2 = Ru,2 + Rz> I(X;U) + I(Y,V;Z\U). 

B. Empirical Coordination - Converse ( Sections [77/] [TV} 

In proving outer bounds for the coordination capacity of 
various networks, a common time mixing trick is to make use 
of a random time variable Q and then consider the value of a 
random sequence X n at the random time Q using notation 
Xq. We first make this statement precise and discuss the 
implications of such a construction. 

Considering a coordination code for a block length n. We 
assign Q to have a uniform distribution over the set {1, n}, 
independent of the action sequences in the network. The 
variable Xq is simply a function of the sequence X n and the 
variable Q; namely, the variable Xq takes on the value of the 
Qth element in the sequence X n . Even though all sequences of 
actions and auxiliary variables in the network are independent 
of Q, the variable Xq need not be independent of Q. 

Here we list a couple of key properties of time mixing. 

Property 1: If all elements of a sequence X n are identically 
distributed, then Xq is independent of Q. Furthermore, Xq 
has the same distribution as X\. Verifying this property is 
easy when one considers the conditional distribution of Xq 
given Q. 

Property 2: For a collection of random sequences X n , Y n , 
and Z n , the expected joint type EPx™ Y n ,z™ is equal to the 
joint distribution of the time-mixed variables (Xq,Yq, Zq). 

E Px™,Y«,z n {x,y,z) 

= ]T p(x n ,y n ,z n )P X n,y n>zn (x,y,z) 

x n ,y n ,z n 

1 " 

p( xn >v n > zn )~y2 1 (( x <iiyv z <i) = ( x iy> z )) 

■n -n -n ^ 1 

x n ,y n ,z n g— 1 

l n 

= -X] zZ pOAyV") 1 ^?^?' 2 ?) = i x ,v, z )) 

^ i n n ii 

q—1 x n ,y n ,z n 

l n 

= -^2Px 9 ,Y 9 ,z q (x,y,z) 

9=1 

n 

= YjPx Q ,Y Q ,z Q \Q{x,y,z\q)p{q) 

9=1 

= Px Q ,Y Q ,z Q (x,y,z). 

1) Two nodes - Theorem\3\ Assume that a rate-coordination 
pair (R,p(y\x)) is in the interior of the coordination capacity 
region C Po for the two-node network of Figure [5] with source 
distribution po(x). For a sequence of (2 nR ,n) coordination 



codes that achieves (R,p(y\x)), consider the induced distri- 
bution on the action sequences. 

Recall that / is the message from node X to node Y. 

nR > H(I) 

> I(X n ;Y n ) 

n 

9=1 

n 

- Y^hx^y^x"- 1 ) 

9=1 

n 

9=1 

= nI(X Q ;Y Q \Q) 
= nI(X Q ;Y Q ,Q) 

> nI(X Q ;Y Q ). 

Equality a comes from Property 1 of time mixing. 

We would like to be able to say that the joint distribution 
of Xq and Yq is arbitrarily close to po(x)p(y\x) for some 
n. That way we could conclude, by continuity of the entropy 
function, that R > I(X; Y). 

The definition of achievability (Definition |5]l states that 

\\Px",Y",Z"(x,y,z) - p Q (x)p(y,z\x)\\ TV — > in probability. 
Because total variation is bounded, this implies that 

E||P X » ,y» z»(x,2/, z) -p (x)p{y,z\x)\\ TV — > 0. 
Furthermore, by the Jensen Inequality, 

EFx»,r»,z»(3;, y, z) — > p (x)p(y, z\x). 

Now Property 2 of time mixing allows us to conclude the 
argument for Theorem [3] 

2) Isolated node - Theorem^} No proof is necessary, as 
this is a special case of the cascade network with R2 = 0. 

3) Cascade - Theorem]^ For the cascade network of Figure 
[8] apply the bound from the two-node network twice — once to 
show that the rate i?i > I(X; Y, Z) is needed even if node Y 
and node Z are allowed to fully cooperate, and once to show 
that the rate R2 > I(X; Z) is needed even if node X and node 
Y are allowed to fully cooperate. 

4) Degraded source - Theorem [6} Assume that a rate- 
coordination quadruple (Ri,R 2 , Rs,p(z\x, y)) is in the inte- 
rior of the coordination capacity region C Pa for the degraded 
source network of Figure [Tol with source distribution po(x) 
and the degraded relationship Y{ = fo(xi). For a sequence 
of (2 nRl ,2 nR2 ,2 nR3 ,n) coordination codes that achieves 
(Ri, R2, Rs,p(z\x,y)), consider the induced distribution on 
the action sequences. 

Recall that the message from node X to node Y at rate R\ 
is labeled /, the message from node X to node Z at rate R2 
is labeled J, and the message from node Y to node Z at rate 
i?3 is labeled K. We identify the auxiliary random variable U 
as the collection of random variables (K, X^^ 1 , Q). 
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nRi > H(I) 

> H{I\Y n ) 

= H(I,K\Y n ) 

> H(K\Y n ) 

= I{X n -K\Y n ) 



Y.HX^KlY^Xi- 1 ) 



9=1 



"£l(X q ;K,X«-\Y«-\Y? +1 \Y q ) 



3=1 

n 

9=1 

= nliXQ-K.xQ-^YQ.Q) 

= n/(x Q; t/|Y Q ). 

Equality a is justified because the message K is a function of 
the message / and the sequence Y n . Equality b comes from 
Property 1 of time mixing. 

nR 2 > H{J) 

> H{J\K) 



set, (Ri, R2, Rs,p(z\x, y)) is in the coordination capacity 
region stated in Theorem [6] 

It remains to bound the cardinality of U. We can use the 
standard method rooted in the support lemma of [35|. The 
variable U should have |<-f||Z| — 1 elements to preserve the 
joint distribution p(x, z), which in turn preserves p{x, y, z), 
H(X), and H(X\Y), and three more elements to preserve 
H{X\U), H(X\Y,U), and H(X\Z, U). 

5) Broadcast - Theorem [7} For the broadcast network of 
Figure QT| apply the bound from the two-node network three 
times — once to show that the rate i?i > I(X; Y) is needed and 
once to show that the rate R2 > I(X; Z) is needed, and finally 
a third time to show that the sum-rate R± + R2 = I(X; Y, Z) 
is needed even if node Y and node Z are allowed to fully 
cooperate. 

6) Cascade multiterminal - Theorem^ Assume that a rate- 
coordination triple (Ri, R2,p(z\x,y)) is in the interior of the 
coordination capacity region C Po for the cascade multiterminal 
network of Figure fT4l with source distribution po(x,y). For a 
sequence of (2 nRl ,2 nR2 ,n) coordination codes that achieves 

R2,p(z\x, j/)), consider the induced distribution on the 
action sequences. 

Recall that the message from node X to node Y at rate i?i is 
labeled /, and the message from node Y to node Z at rate R2 is 
labeled J. We identify the auxiliary random variable U as the 
collection of random variables (J, X^^ 1 , Y^^ 1 , Yq +1 , Q). 
This is the same choice of auxiliary variable used by Wyner 
and Ziv ll27l . Notice that U satisfies the Markov chain prop- 
erties U-X Q -Y Q and X Q - (Y Q , U) - Z Q 



= H(J,Z n \K) 

> H{Z n \K) 
= I{X n ;Z n \K) 

71 

9=1 

71 

> j^IiXjZ^X"- 1 ) 

9=1 

= nI(XQ;Z Q \K,xQ-\Q) 
= nI(X Q ;Z Q \U). 

Equality a is justified because the action sequence Z" is a 
function of the messages J and K. Equality b comes from 
Property 1 of time mixing. 

nR 3 > H(K) 

= I(X n ; K) 

71 

9=1 

- nI(X Q ;K\xQ-\Q) 
= nI(X Q ;K,xQ-\Q) 
= nI(X Q ;U). 

Equality a comes from Property 1 of time mixing. 

As seen in the proof for the two-node network, the joint 
distribution of Xq, Yq, and Zq is arbitrarily close to 

Po(x)l(y = fo(x))p(z\x,y). Therefore, since C Po is a closed Equality a comes from Property 1 of time mixing. 



nRi > H(I) 

> H{I\Y n ) 
= I(X n ;I\Y n ) 



Y,I{X q \I\Y n ,X"- x ) 



q=l 



= ]T/(x g ;/ ; ^-\^-\y; +1 |y g 



9=1 



= nI(X Q ;I,xQ-\YQ-\Y$ +1 \Y Q ,Q) 

5= nI(X Q ;I,xQ-\YQ-\Y% +l ,Q\YQ) 
= nI(X Q ;U\Y Q ). 
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nR 2 > H(J) 

> I{X n ,Y n ;Z n ) 

n 

9=1 
n 

= j2 I ( x i^ z ^ xq ~ 1 > Y "~ 1 ) 

9=1 

n 

9=1 

= n/(XQ,r Q ;ZQ|g) 
== 7i/(Xq,Yq;Z Q ,Q) 

> 7(X Q) y Q ;Z Q ). 

Equality a comes from Property 1 of time mixing. 

As seen in the proof for the two-node network, the 
joint distribution of Xq, Yq, and Zq is arbitrarily close 
to Po(x, y)p(z\x, y). Therefore, since C Po is a closed set, 
B,2,p(z\x, y)) is in the coordination capacity region 
stated in Theorem [8] 

It remains to bound the cardinality of U. We can again 
use the standard method of ||3~5l . Notice that p(x,y, z\u) = 
p(x\u)p(y\x)p(z\y , u) captures all of the Markovity con- 
straints of the outer bound. Therefore, convex mixtures of 
distributions of this form are valid for achieving points in 
the outer bound. The variable U should have |^f||^||2| — 1 
elements to preserve the joint distribution p(x, y, z), which 
in turn preserves I(X,Y;Z) and H(X\Y), and one more 
element to preserve H(X\Y, U). 

C. Strong Coordination ( Section 

1 ) No communication - Theorem [9} The network of Figure 
[16] with no communication generalizes Wyner's common in- 
formation work [9| to three nodes. Here we provide a sketch 
of the proof. 

The following phenomenon was noticed both by Wyner J9) 
and by Han and Verdu [14|. Consider a memoryless channel 
p(x\u). A channel input with distribution p(u) induces an 
output with distribution p(x) = 2~2uP( u )p( x \ u )- If the inputs 
are i.i.d. then the outputs are i.i.d. as well. Now suppose that 
instead a channel input sequence U n is chosen uniformly at 
random from a set A4 of 2 nR deterministic sequences. If 
R > I(X; U) then the set A4 can be chosen so that the output 
distribution is arbitrarily close in total variation to the i.i.d. 
distribution Il^iM 2 '*) f° r l ar g e enough n. 

Figure [22] illustrates how to achieve the strong coordination 
capacity region C of Theorem [9] Let each decoder simulate a 
memoryless channel from U to X, Y, or Z, depending on the 
particular node. The common randomness u is used to index 
a sequence U n (uj) that is used as the inputs to the channels. 
Notice that the action sequences X n , Y n , and Z n produced 
via these three separate channels are distributed the same as 
if they were generated as outputs of a single channel because 
p(x,y, z\u) = p(x\u)p(y\u)p(z\u) according to the definition 



Common randomnes; 




Fig. 22. Achievability for no-communication network. The strong coordi- 
nation capacity region C of Theorem [9] is achieved in a network with no 
communication by using the common randomness to specify a sequence 
U"(cj) that is then passed through a memoryless channel at each node using 
private randomness. 



of C in the theorem. Since R > I(X, Y, Z; U) for points in 
the interior of C, this scheme will achieve strong coordination. 

For the converse, identify the auxiliary variable U as ui and 
notice that X q , Y q , and Z q are conditionally independent (for 
all q) given w. 

nR > H(lu) 

> I(X n ,Y n ,Z n ;uj) 

> I{X n ,Y n 1 Z n ;U). 

Since X n , Y n , and Z n have a joint distribution close in total 
variation to the i.i.d. distribution Y¥i=i p{ x iiVu z i)i it can t> e 
shown that they can essentially be treated as i.i.d. sequences 
in the mutual information bounds (see [30]). If they were i.i.d. 
we would have 

I(X n , Y n , Z n ; U) 

n 

9=1 
n 

= ^2l(X q ,Y q ,Z q ;U,X"-\Y"-\Z"- 1 ) 

9=1 

n 

9=1 

> nTaiaI(X,Y,Z;U), 
u 

where the minimization is over all eligible auxiliary U that 
separate X, Y, and Z into conditional independence. 

It remains to bound the cardinality of U. We can again 
use the standard method of ll35l . The variable U should 
have | A" 1 1 J 7 1 1 -2 1 — 1 elements to preserve the joint distribution 
p(x, y, z), which in turn preserves H(X, Y, Z), and one more 
element to preserve H(X, Y,Z\U). 

2) Two nodes - Theorem [70} The strong coordination 
capacity region for the two-node network of Figure [18] is the 
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main result of [30|: 



-Pa 



p(y\x) : 3p(u\x, y) such that 
p(x,y,u) = p(u)p(x\u)p(y\u) 

\u\ < \x\\y\ + i, 

R>I(X;U), 

Ro + R> I(X,Y;U). 



(13) 



where Rq refers to the rate of common randomness, and R 
refers to the communication rate. 

In the case of no common randomness (i?o = 0), the 
stronger inequality in ( fT3l ) on the rate R become the second, 
R > I(X, Y;U). Because of the Markov constraint on U, 
the minimum value of the right-hand side of this inequality is 
Wyner's common information C(X;Y). 

Additionally, Theorem [10] states that if Rq is greater than 
the necessary conditional entropy H{Y]X) then rates R > 
I(X;Y) are sufficient for achieving strong coordination. This 
is a straightforward application of the definition of H{Y]X). 
We can verify this with the following choice of U : 

U = argmin H(f(Y)\X). 

f(Y) -.X-f(Y)-Y 

Notice that this choice of U separates X and Y into a Markov 
chain by definition. Also, the mutual information I(X; U) is 
less than or equal to I(X; Y), since U is a function of Y, 
thus satisfying the first rate inequality in ( fT3l . The second 
inequality is satisfied because of the chain rule, 



I(X,Y;U) 



< 



I(X;U) + I(Y;U\X) 
I(X;U) + H(Y\X) 
I{X-Y)+H{Y\X). 



Furthermore, we can show that this is the least amount 
of common randomness needed to fully expand the strong 
coordination capacity region. In other words, the minimum 
i?o such that (Rq, I(X;Y)) is in the strong rate-coordination 
region K po p(y\x) is H(Y\X). 

To prove this, first consider the implications of R = 
I(X;Y). This means that in order to satisfy the first rate in- 
equality in ( fT3] l, we must have I(X; U) < I(X; Y). However, 
because of the Markovity, I(X; U) = I(X; U,Y). Therefore, 
I(X; U\Y) = 0, which implies a second Markov condition 
X - Y - U in addition to X - U - Y. 

We are concerned with minimizing the required rate of 
common randomness Rq. Since R = I(X; Y), the second rate 
inequality in dT3l > becomes Rq > I(Y; U\X). The conditional 
entropy H(Y\X) is fixed, so we want to maximize the 
conditional entropy H(Y\U, X). 

With the distribution p(x\y) in mind, we can clump values 
of Y together for which the channel from Y to X is identical. 
Define a function / with the property that 



m = m 



p(x\y) = p(x\y) for Vx e X. (14) 



Letting U = f(Y) will be the choice of U that simultaneously 
maximizes H(Y\U,X) and satisfies the Markov conditions 
X — U — Y and X — Y — U. We can compare U to any 
other choice U that satisfies the conditions and show that the 
resulting conditional entropy H(Y\U, X) is smaller. 



Another way to state the two Markov conditions is that 
for all values of y and u such that p(y, u) > 0, the con- 
ditional distributions p(x\y) and p(x\u) are equal because 
p(x\y) — p(x\y,u) = p(x\u). Notice that the value of 
U = f(Y), characterized in ( TBI , only depends on the channel 
p(x\y). However, with probability one the value of U can 
be determined from U based on the conditional distribution 
p(x\u). Therefore, 

H(Y\U,X) = H(Y,U\U,X) 

= H(Y\U,U,X) + I(U\U,X) 

= H(Y\U,U,X) 

< H(Y\U,X). 

D. Rate-distortion theory ( Sections I WD 

We establish the relationship from Theorem [TT] between the 
coordination capacity region and the rate-distortion region in 
two parts. First we show that T> Po contains AC Po and then the 
other way around. To keep clutter to a minimum and without 
loss of generality, we only discuss a single distortion measure 
d, rate R, and a pair of sequences of actions X n and Y n . 

1) Coordination implies distortion (T> po 2> AC Po ): The 
distortion incurred with respect to a distortion function d on 
a set of sequences of actions is a function of the joint type of 
the sequences. That is, 

1 - 

d^(x n ,y n ) - -Y,d(xi, yi ) 
n * — ' 

i=l 
1 - 

= -22z2'L(x i =x,y i = y)d(x,y) 

i—l x,y 

1 ™ 

= y_\d(x,y)- y2l(xi = x,y t = y) 

x,y i—l 

= ^2d(x,y)P x n !yn (x,y) 



V Pxn „d(X,Y). 



(15) 



When a rate-coordination tuple (R,p(x, y)) is in the interior 
of the coordination capacity region C Po , we are assured the 
existence of a coordination code for any e > for which 

Pr(\\Px»,Y*> -p\\rv > e) < e. 

Therefore, with probability greater that 1 — e, 

E PxnYn d(X,Y) < E p d(X,Y)+ed max . 

Recalling ( TT3T > yields, 

Ed< n > < E p d(X,Y) + 2ed max . 



n R 



coordination codes 



As expected, a sequence of (2 
that achieves empirical coordination for the joint distribution 
p(x, y) also achieves the point in the rate-distortion region 
with the same rate and with distortion value F, p d(X, Y). 
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2) Distortion implies coordination (T> Po C AC Po ): Sup- 
pose that a (2 nR ,n) rate-distortion codes achieves distortion 
EcfW (X" ,y n ) < £>. Substituting from 0, 

E[E Pxn>Yn d(X,Y)] < D. 

However, 

E[E Px „ y „d(X,F)] = E EPx „ v „d(X,y) 
by linearity. 

We can achieve the rate-coordination pair (i?, EPx«,y») by 
augmenting the rate-distortion code. If we repeat the use of 
the rate-distortion code over fc blocks of length n each, then 
we induce a joint distribution on (X kn , Y kn ) that consists of 
i.i.d. sub-blocks (X n ,Y n ),...,(X^_ n+1 ,Y^_ n+1 ) denoted 

as (x^ n , y«™), (i (t) ",y Wn ). 

By the weak law of large number, 

1 fc 

i=l 

— >• EPx",y™ in probability. 

Point-wise convergence in probability implies that as fc grows 

Pjfibn^fcn — EPjfn^n || — > in probability. 

Thus, for any point (R, D) in the rate-distortion region 
we have identified an associated point (R, EPx^.Y" ) in the 
coordination-capacity region. Indeed, the rate-distortion region 
is a linear projection of the coordination-capacity region. 

VIII. Remarks 

Rather than inquire about the possibility of moving data 
in a network, we have asked for the set of all achievable 
joint distribution on actions at the nodes. For some three- 
node networks we have fully characterized the answer to this 
question, while for others we have established bounds. 

Some of the results discussed in this work extend nicely 
to larger networks. Consider for example an extended cascade 
network shown in Figure [23] where X is given randomly by 
nature and Y\ through Yfc-i are actions based on a cascade 
of communication. Just as in the cascade network of Section 
IIII-CI we can achieve rates Ri > I(X; y, Yk) for empirical 
coordination by sending messages to the last nodes in the chain 
first and conditioning later messages on earlier ones. These 
rates meet the cut-set bound. We now can make an interesting 
observation about assigning unique tasks to nodes in such a 
network. Suppose fc tasks are to be completed by the fc nodes 
in this cascade network, one at each node. Node X is assigned 
a task randomly, and the communication in the network is used 
to assign a permutation of all the tasks to the nodes in the 
network. The necessary rates in the network are Ri > log(j). 
The sum of all the rates in the network, for large fc, is then 
approximately Rtotai > k nats, where fc is the number of tasks 
and nodes in the network. 

Now consider the same task assignment scenario for an 
extended broadcast network shown in Figure [24] Here again X 
is given randomly by nature, but Y\ through Yfc-i are actions 
based on individual messages sent to each of the nodes. Again, 
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Rk-l 












Y k - 2 





Fig. 23. Extended cascade network. This is an extension of the cascade 
network of Section IIII-CI Action X is given randomly by nature according to 
po{x), and a cascade of communication is used to produce actions Y\ through 
ifc—l- The coordination capacity region contains all rate-coordination tuples 
that satisfy Ri > I(X; Vj. Yj.) for all i. In particular, the sum rate needed 
to assign a permutation of k tasks to the k nodes grows linearly with the 
number of nodes. 

we want to assign a permutation of all the fc tasks to all of the 
fc nodes. We can use ideas from the broadcast network results 
of Section llV-AI For example, let us assign default tasks to the 
nodes so that Y\ = 1, Yk-i = k — 1 unless told otherwise. 
Now the communication is simply used to tell each node when 
it must choose task fc rather than the default task, which will 
happen about one time out of fc. The rates needed for this 
scheme are Ri > H(l/k), where H is the binary entropy 
function. For large fc, the sum of all the rates in the network 
is approximately Rtotai > lnfc + 1 nats. The cut-set bound 
gives us a lower bound on the sum rate of Rtotai > In fc nats. 
Therefore, we can conclude that the optimal sum rate scales 
with the logarithm of the number of nodes in the network. 




Fig. 24. Extended broadcast network. This is an extension of the broadcast 
network of Section llV-AI Action X is given randomly by nature according to 
po(x), and each peripheral node produces an action K, based on an individual 
message at rate Ri. Bounds on the coordination capacity region show that 
the sum rate needed to assign a permutation of k tasks to the k nodes grows 
logarithmically with the number of nodes. 

Even without explicitly knowing the coordination capacity 
region for the broadcast network, we are able to use bounds 
to establish the scaling laws for the total rate needed to assign 
tasks uniquely, and we can compare the efficiency of the 
broadcast network (logarithmic in the network size) with that 
of the cascade network (linear in the network size) for this 
kind of coordination. 

We would also like to understand the coordination capacity 
region for a noisy network. For example, the communication 
capacity region for the broadcast channel p{y\ , y2\x) of Figure 
[25] has undergone serious investigation. The standard question 
is, how many bits of independent information can be com- 
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municated from X to Y% and from X to Y^. We know the 
answer if the broadcast channel is degraded; that is, if Yi 
can be viewed as a noisy version of Y\, We also know the 
answer if the channel can be separated into two orthogonal 
channels or is deterministic. But what if instead we are trying 
to coordinate actions via the broadcast channel, similar to the 
broadcast network of Section IIV-AI ? Now we care about the 
dependence between Y\ and Y%. The broadcast channel will 
impose a natural dependence between the channel outputs 
Y\ and Y% that we abolish if we try to send independent 
information to the two nodes. After all, the communication 
capacity region for the broadcast channel depends only on the 
marginals p(yi\x) wdp(y2\x). Here we are wasting a valuable 
resource — the natural conditional dependence between Y\ and 
Y 2 given X. 



x- 
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Decoder 










Y 2 


Decoder 







Fig. 25. Broadcast channel. When a noisy channel is used to coordinate 
joint actions (X, Yi, Y2), what is the resulting coordination capacity region? 
The broadcast network of Section IIV-AI is a noiseless special case. 



Again, we are enlarging the focus from communication of 
independent information to the creation of coordinated actions. 
This larger question may force a simpler solution and illu- 
minate the problem of independent information (the standard 
channel capacity formulation) as a special case. Presumably, 
information is being communicated for a reason — so future 
cooperative behavior can be achieved. 

IX. Final Remarks 

At first it seems that the nodes in a network can cooperate 
arbitrarily without communication. Prior arrangement achieves 
that. Also common randomness achieves it. 

But the problem changes dramatically when some of the 
nodes take actions specified by nature. Now some communi- 
cation to the remaining nodes becomes necessary to establish 
the desired dependence. 

We have established the rate-dependence tradeoff for cas- 
cade networks and isolated node networks found in Sectionlllll 
The broadcast network of Figure QT| remains elusive, perhaps 
for the same reason that the broadcast channel is difficult. 
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